2024-08-17 12:55:56,791 INFO [train_multi_KD3.py:1187] (3/4) Training started 2024-08-17 12:55:56,791 INFO [train_multi_KD3.py:1197] (3/4) Device: cuda:3 2024-08-17 12:55:56,797 INFO [train_multi_KD3.py:1212] (3/4) Using dtype=torch.bfloat16 2024-08-17 12:55:56,797 INFO [train_multi_KD3.py:1214] (3/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': 'e400fa3b456faf8afe0ee5bfe572946b4921a3db', 'k2-git-date': 'Sat Jul 15 04:21:50 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.9', 'icefall-git-branch': 'multi_KD_with_wenet', 'icefall-git-sha1': '0d2af1df-clean', 'icefall-git-date': 'Wed Aug 14 17:27:16 2024', 'icefall-path': '/xy/mnt/yangxiaoyu/workspace/icefall_multi_KD', 'k2-path': '/root/anaconda3/lib/python3.9/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/lib/python3.9/site-packages/lhotse/__init__.py', 'hostname': 'NGK_xiaoyu'}, 'world_size': 4, 'master_port': 13440, 'tensorboard': True, 'num_epochs': 35, 'start_epoch': 1, 'start_batch': 332000, 'exp_dir': PosixPath('multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'stop_early': True, 'use_fp16': False, 'use_bf16': True, 'share_asr': True, 'beats_loss_scale': 1.0, 'ecapa_loss_scale': 10.0, 'whisper_loss_scale': 1.0, 'whisper_cb_loss_scale': 0.01, 'repeat_librispeech': 5, 'repeat_wenetspeech': 0, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': True, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'speaker_input_idx': 2, 'whisper_dim': 1280, 'use_task_id': True, 'num_codebooks': 32, 'mvq_kd_layer_idx': -1, 'use_subsampled_output': True, 'delta_t': 6, 'full_libri': True, 'mini_libri': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_librispeech': True, 'use_wenetspeech': False, 'use_audioset': True, 'audioset_subset': 'unbalanced', 'use_voxceleb': True, 'voxceleb_subset': 'vox2', 'use_fma': False, 'fma_subset': 'large', 'manifest_dir': PosixPath('data/fbank_LSVoxAs_with_whisper_large-v3_with_taskID'), 'max_duration': 1500, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 1, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': False, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'large-v3', 'use_mert': False, 'blank_id': 0, 'vocab_size': 500, 'dtype': torch.bfloat16, 'use_amp': True} 2024-08-17 12:55:56,797 INFO [train_multi_KD3.py:1216] (3/4) About to create model 2024-08-17 12:55:57,157 INFO [model_shift.py:142] (3/4) Delta_t: 6 when computing the distillation loss 2024-08-17 12:55:57,162 INFO [train_multi_KD3.py:1220] (3/4) Number of model parameters: 66484678 2024-08-17 12:55:57,162 INFO [checkpoint.py:112] (3/4) Loading checkpoint from multi_KD/exp_causal1_delta6KD_LS1_5fold+wenetspech0_0fold+as_unbalanced1+vox_1_vox2_base_lr_0.045_use_beats_1_scale_1.0_use_ecapa_1_layer_2_scale_10.0_1_scale_1.0_specaug0_musan0_with_task_ID_stop_early1_share_asr1_md1500_amp_bf16/checkpoint-332000.pt 2024-08-17 12:55:59,703 INFO [train_multi_KD3.py:1235] (3/4) Using DDP 2024-08-17 12:56:01,428 INFO [train_multi_KD3.py:1247] (3/4) Loading optimizer state dict 2024-08-17 12:56:01,729 INFO [train_multi_KD3.py:1255] (3/4) Loading scheduler state dict 2024-08-17 12:56:01,729 INFO [kd_datamodule.py:690] (3/4) About to get train 960 cuts 2024-08-17 12:56:01,789 INFO [train_multi_KD3.py:1306] (3/4) Getting audioset cuts 2024-08-17 12:56:01,789 INFO [kd_datamodule.py:900] (3/4) About to get the audioset cuts for KD. 2024-08-17 12:56:01,808 INFO [kd_datamodule.py:869] (3/4) About to get the voxceleb cuts. 2024-08-17 12:56:01,815 INFO [kd_datamodule.py:880] (3/4) Adding voxceleb2 cuts. 2024-08-17 12:56:01,817 INFO [train_multi_KD3.py:1320] (3/4) Using mux to combine Librispeech: True, WenetSpeech: False, audioset: True and voxceleb: True 2024-08-17 12:56:09,798 INFO [train_multi_KD3.py:1322] (3/4) Using mux to combine [CutSet(len=1406195) [underlying data type: ], CutSet(len=1904746) [underlying data type: ], CutSet(len=1187704) [underlying data type: ]] 2024-08-17 12:56:09,799 INFO [train_multi_KD3.py:1323] (3/4) Using weights: [1406195, 1904746, 1187704] 2024-08-17 12:56:09,799 INFO [train_multi_KD3.py:1332] (3/4) CutSet(len=4498645) [underlying data type: ] 2024-08-17 12:56:09,799 INFO [kd_datamodule.py:449] (3/4) Disable MUSAN 2024-08-17 12:56:09,801 INFO [kd_datamodule.py:489] (3/4) Disable SpecAugment 2024-08-17 12:56:09,801 INFO [kd_datamodule.py:491] (3/4) About to create train dataset 2024-08-17 12:56:09,807 INFO [kd_datamodule.py:528] (3/4) Using SimpleCutSampler 2024-08-17 12:56:09,808 INFO [kd_datamodule.py:536] (3/4) About to create train dataloader 2024-08-17 12:56:09,808 INFO [kd_datamodule.py:539] (3/4) Loading sampler state dict 2024-08-17 12:57:15,617 INFO [kd_datamodule.py:763] (3/4) About to get dev-clean cuts 2024-08-17 12:57:15,619 INFO [kd_datamodule.py:781] (3/4) About to get dev-other cuts 2024-08-17 12:57:15,620 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-17 12:57:15,860 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-17 12:57:15,861 INFO [kd_datamodule.py:840] (3/4) About to get the test set of voxceleb1 set. 2024-08-17 12:57:15,861 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-17 12:57:16,061 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-17 12:57:16,061 INFO [kd_datamodule.py:912] (3/4) About to get the audioset eval cuts. 2024-08-17 12:57:16,066 INFO [kd_datamodule.py:570] (3/4) About to create dev dataset 2024-08-17 12:57:16,543 INFO [kd_datamodule.py:591] (3/4) About to create dev dataloader 2024-08-17 12:57:16,543 INFO [train_multi_KD3.py:1412] (3/4) ['ASR_libri', 'SV_voxceleb1', 'AT_audioset'] 2024-08-17 12:57:16,543 INFO [train_multi_KD3.py:1416] (3/4) Loading grad scaler state dict 2024-08-17 12:57:30,037 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 0, loss[loss=0.09558, beats_loss=0.0104, ecapa_loss=0.0001384, whisper_loss=0.08379, over 18765.00 frames. ], tot_loss[loss=0.09558, beats_loss=0.0104, ecapa_loss=0.0001384, whisper_loss=0.08379, over 18765.00 frames. ], batch size: 74, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 12:57:30,038 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 12:58:09,359 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2516, beats_loss=0, ecapa_loss=0.0005218, whisper_loss=0.2464, over 922467.00 frames. 2024-08-17 12:58:23,185 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004106, beats_loss=0, ecapa_loss=0.0004106, whisper_loss=0, over 939242.00 frames. 2024-08-17 13:00:20,862 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02324, beats_loss=0.02324, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 13:00:20,865 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 13:00:21,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3320000.0, ans=0.0 2024-08-17 13:00:22,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2024-08-17 13:00:31,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3320000.0, ans=0.05 2024-08-17 13:00:34,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.29 vs. limit=15.0 2024-08-17 13:00:42,756 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-17 13:00:49,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320100.0, ans=0.1 2024-08-17 13:00:56,599 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 13:01:20,912 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 13:01:26,275 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 13:01:31,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-17 13:01:40,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3320400.0, ans=0.0 2024-08-17 13:01:43,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-17 13:01:47,192 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 13:01:51,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3320500.0, ans=0.1 2024-08-17 13:01:51,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 50, loss[loss=0.08836, beats_loss=0.01191, ecapa_loss=0.0001409, whisper_loss=0.07504, over 19368.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01071, ecapa_loss=0.0001414, whisper_loss=0.08951, over 920938.39 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:01:54,745 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-17 13:02:00,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3320500.0, ans=0.0 2024-08-17 13:02:04,275 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-17 13:02:18,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-17 13:02:37,115 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 13:02:48,182 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.300e+01 2.566e+01 2.925e+01 4.524e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-17 13:02:59,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3320900.0, ans=0.0 2024-08-17 13:03:01,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3320900.0, ans=12.0 2024-08-17 13:03:08,938 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 100, loss[loss=0.09698, beats_loss=0.01141, ecapa_loss=0.0001373, whisper_loss=0.08419, over 17380.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001474, whisper_loss=0.08953, over 1560959.40 frames. ], batch size: 65, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:03:14,558 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 13:03:42,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2024-08-17 13:03:47,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3321200.0, ans=0.125 2024-08-17 13:03:49,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3321200.0, ans=0.125 2024-08-17 13:03:50,211 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 13:04:09,020 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 37 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-17 13:04:09,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3321400.0, ans=0.2 2024-08-17 13:04:09,342 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.258e+00 2024-08-17 13:04:18,065 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 13:04:24,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 150, loss[loss=0.09119, beats_loss=0.01082, ecapa_loss=0.0001894, whisper_loss=0.07847, over 19047.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001473, whisper_loss=0.08943, over 2100854.35 frames. ], batch size: 79, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:04:41,738 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-17 13:04:46,743 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-17 13:04:52,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3321600.0, ans=0.125 2024-08-17 13:04:52,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3321600.0, ans=0.1 2024-08-17 13:05:17,984 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.279e+01 2.554e+01 2.913e+01 4.090e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-17 13:05:19,900 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 13:05:25,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3321900.0, ans=0.125 2024-08-17 13:05:35,752 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 13:05:38,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 200, loss[loss=0.1138, beats_loss=0.008214, ecapa_loss=0.0001738, whisper_loss=0.1039, over 15585.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.000148, whisper_loss=0.08984, over 2521796.76 frames. ], batch size: 60, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:05:42,293 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 13:05:50,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3322000.0, ans=0.0 2024-08-17 13:05:54,460 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 13:06:01,382 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 13:06:01,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3322100.0, ans=0.125 2024-08-17 13:06:04,263 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 13:06:04,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3322100.0, ans=0.2 2024-08-17 13:06:15,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.18 vs. limit=22.5 2024-08-17 13:06:21,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3322300.0, ans=0.125 2024-08-17 13:06:24,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3322300.0, ans=0.1 2024-08-17 13:06:30,643 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 13:06:39,811 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-17 13:06:50,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 250, loss[loss=0.08933, beats_loss=0.01172, ecapa_loss=0.0001154, whisper_loss=0.07645, over 21787.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001463, whisper_loss=0.09103, over 2837358.21 frames. ], batch size: 84, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:06:51,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3322500.0, ans=0.04949747468305833 2024-08-17 13:06:55,258 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 13:06:57,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3322500.0, ans=0.125 2024-08-17 13:07:02,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=8.0 2024-08-17 13:07:03,601 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-17 13:07:07,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3322600.0, ans=0.1 2024-08-17 13:07:14,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3322600.0, ans=0.1 2024-08-17 13:07:15,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3322600.0, ans=0.125 2024-08-17 13:07:18,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3322700.0, ans=0.0 2024-08-17 13:07:18,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2024-08-17 13:07:21,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3322700.0, ans=0.125 2024-08-17 13:07:23,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3322700.0, ans=0.0 2024-08-17 13:07:30,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3322800.0, ans=0.1 2024-08-17 13:07:40,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.294e+01 2.563e+01 2.958e+01 9.042e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-17 13:07:58,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 300, loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001595, whisper_loss=0.08924, over 18177.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.000147, whisper_loss=0.09088, over 3031524.77 frames. ], batch size: 72, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:08:00,196 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-17 13:08:07,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3323000.0, ans=0.0 2024-08-17 13:08:08,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3323000.0, ans=0.0 2024-08-17 13:08:11,427 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 13:08:11,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=3323100.0, ans=22.5 2024-08-17 13:08:13,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3323100.0, ans=0.125 2024-08-17 13:08:19,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.89 vs. limit=10.0 2024-08-17 13:08:28,238 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-17 13:08:43,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3323300.0, ans=0.125 2024-08-17 13:08:44,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=3323300.0, ans=0.95 2024-08-17 13:08:54,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3323400.0, ans=0.1 2024-08-17 13:08:55,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-08-17 13:09:07,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 350, loss[loss=0.1103, beats_loss=0.01102, ecapa_loss=0.0001598, whisper_loss=0.09769, over 22011.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.000148, whisper_loss=0.0902, over 3196352.79 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:09:17,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3323500.0, ans=0.0 2024-08-17 13:09:43,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3323700.0, ans=0.2 2024-08-17 13:09:56,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.647e+01 2.195e+01 2.374e+01 2.744e+01 6.242e+01, threshold=4.747e+01, percent-clipped=1.0 2024-08-17 13:10:04,068 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 13:10:04,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3323900.0, ans=0.0 2024-08-17 13:10:09,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3323900.0, ans=0.5 2024-08-17 13:10:15,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 400, loss[loss=0.1038, beats_loss=0.01113, ecapa_loss=0.0001209, whisper_loss=0.09148, over 14211.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.09102, over 3360215.30 frames. ], batch size: 54, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:10:25,903 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-17 13:10:31,149 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-17 13:10:39,430 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 13:10:41,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3324200.0, ans=0.125 2024-08-17 13:10:50,285 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 14 from Vox, 51 fro AS 2024-08-17 13:11:05,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3324300.0, ans=0.125 2024-08-17 13:11:13,455 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-17 13:11:21,296 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 13:11:22,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 450, loss[loss=0.08141, beats_loss=0.01357, ecapa_loss=0.00011, whisper_loss=0.06673, over 20311.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001471, whisper_loss=0.08993, over 3486778.94 frames. ], batch size: 80, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:11:30,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-17 13:11:40,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3324600.0, ans=0.125 2024-08-17 13:11:49,601 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-17 13:12:05,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3324800.0, ans=0.125 2024-08-17 13:12:10,456 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.395e+01 2.662e+01 2.983e+01 2.736e+02, threshold=5.325e+01, percent-clipped=1.0 2024-08-17 13:12:12,140 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 13:12:29,379 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 13:12:30,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 500, loss[loss=0.1183, beats_loss=0.008536, ecapa_loss=0.0001795, whisper_loss=0.1079, over 22680.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.000147, whisper_loss=0.09003, over 3579746.27 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:12:32,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3325000.0, ans=0.2 2024-08-17 13:12:45,722 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 13:12:52,418 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-17 13:12:55,678 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 13:13:03,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3325200.0, ans=0.125 2024-08-17 13:13:11,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3325300.0, ans=0.125 2024-08-17 13:13:18,172 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 13:13:26,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-08-17 13:13:27,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3325400.0, ans=0.07 2024-08-17 13:13:30,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=12.0 2024-08-17 13:13:38,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 550, loss[loss=0.09818, beats_loss=0.01019, ecapa_loss=0.0001383, whisper_loss=0.08661, over 15410.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.0001469, whisper_loss=0.09073, over 3626398.96 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:13:45,915 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-17 13:13:58,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3325600.0, ans=0.125 2024-08-17 13:14:00,855 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.168e-01 2024-08-17 13:14:03,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3325600.0, ans=0.125 2024-08-17 13:14:12,209 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 13:14:19,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3325700.0, ans=0.125 2024-08-17 13:14:23,552 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-17 13:14:27,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3325800.0, ans=0.125 2024-08-17 13:14:34,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.469e+01 2.694e+01 3.140e+01 4.946e+01, threshold=5.388e+01, percent-clipped=0.0 2024-08-17 13:14:35,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3325800.0, ans=0.1 2024-08-17 13:14:47,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.12 vs. limit=22.5 2024-08-17 13:14:48,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3325900.0, ans=0.125 2024-08-17 13:14:53,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 600, loss[loss=0.1233, beats_loss=0.009036, ecapa_loss=0.0001599, whisper_loss=0.1127, over 22932.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.09083, over 3683519.24 frames. ], batch size: 90, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:14:55,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3326000.0, ans=0.1 2024-08-17 13:15:03,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.15 vs. limit=15.0 2024-08-17 13:15:18,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3326100.0, ans=0.125 2024-08-17 13:15:32,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3326200.0, ans=0.125 2024-08-17 13:15:47,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3326300.0, ans=0.07 2024-08-17 13:16:07,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 650, loss[loss=0.1114, beats_loss=0.01187, ecapa_loss=0.0001143, whisper_loss=0.09834, over 23452.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001487, whisper_loss=0.09066, over 3736699.17 frames. ], batch size: 89, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:16:12,410 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 13:16:50,508 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 13:16:52,324 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 13:16:52,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-17 13:16:56,539 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 13:17:00,525 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.302e+01 2.585e+01 2.989e+01 8.771e+01, threshold=5.171e+01, percent-clipped=2.0 2024-08-17 13:17:06,265 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-17 13:17:10,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3326900.0, ans=0.125 2024-08-17 13:17:20,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 700, loss[loss=0.08982, beats_loss=0.01085, ecapa_loss=0.000142, whisper_loss=0.07756, over 17666.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01058, ecapa_loss=0.000148, whisper_loss=0.09048, over 3767024.69 frames. ], batch size: 71, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:17:47,403 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-17 13:17:50,000 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 13:17:57,932 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 13:18:01,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3327200.0, ans=0.2 2024-08-17 13:18:02,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3327300.0, ans=10.0 2024-08-17 13:18:31,709 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 750, loss[loss=0.09747, beats_loss=0.008207, ecapa_loss=0.0001555, whisper_loss=0.08771, over 14504.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001477, whisper_loss=0.09084, over 3804435.79 frames. ], batch size: 55, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:18:45,420 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 13:19:24,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.330e+01 2.505e+01 2.763e+01 4.149e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-17 13:19:44,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 800, loss[loss=0.1015, beats_loss=0.009978, ecapa_loss=0.000147, whisper_loss=0.09005, over 15474.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.0001477, whisper_loss=0.09097, over 3853382.07 frames. ], batch size: 61, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:19:46,201 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 13:19:50,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3328000.0, ans=0.125 2024-08-17 13:19:51,726 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 28 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-17 13:19:52,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2024-08-17 13:19:59,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2024-08-17 13:20:04,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328100.0, ans=0.1 2024-08-17 13:20:33,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=15.0 2024-08-17 13:20:41,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3328400.0, ans=0.95 2024-08-17 13:20:49,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-17 13:20:56,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 850, loss[loss=0.1032, beats_loss=0.01182, ecapa_loss=0.0001303, whisper_loss=0.09007, over 18544.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01061, ecapa_loss=0.0001478, whisper_loss=0.09062, over 3842612.61 frames. ], batch size: 72, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:20:57,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.18 vs. limit=22.5 2024-08-17 13:20:58,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3328500.0, ans=0.125 2024-08-17 13:21:10,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3328600.0, ans=0.125 2024-08-17 13:21:21,704 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 13:21:22,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3328600.0, ans=0.125 2024-08-17 13:21:23,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3328600.0, ans=0.125 2024-08-17 13:21:30,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3328700.0, ans=0.1 2024-08-17 13:21:35,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3328700.0, ans=0.0 2024-08-17 13:21:49,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.262e+01 2.529e+01 2.778e+01 3.755e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-17 13:21:49,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3328800.0, ans=0.09899494936611666 2024-08-17 13:22:03,676 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 13:22:08,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 900, loss[loss=0.09122, beats_loss=0.007782, ecapa_loss=0.0001811, whisper_loss=0.08163, over 14352.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.000148, whisper_loss=0.08992, over 3858727.33 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:22:18,824 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 13:22:19,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3329000.0, ans=0.1 2024-08-17 13:22:27,074 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 13:22:46,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3329200.0, ans=0.0 2024-08-17 13:22:49,616 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 13:22:49,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3329200.0, ans=0.125 2024-08-17 13:22:56,993 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 29 from Vox, 18 fro AS 2024-08-17 13:22:57,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:22:57,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3329300.0, ans=0.0 2024-08-17 13:23:01,937 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 23 from Vox, 22 fro AS 2024-08-17 13:23:02,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:03,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3329300.0, ans=0.125 2024-08-17 13:23:09,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3329400.0, ans=0.0 2024-08-17 13:23:12,897 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-17 13:23:20,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 950, loss[loss=0.1022, beats_loss=0.009994, ecapa_loss=0.0001496, whisper_loss=0.09069, over 22536.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001484, whisper_loss=0.08953, over 3845544.83 frames. ], batch size: 93, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:23:26,280 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-17 13:23:42,981 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.75 vs. limit=10.0 2024-08-17 13:24:00,176 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 13:24:12,390 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.272e+01 2.519e+01 2.767e+01 4.304e+01, threshold=5.037e+01, percent-clipped=0.0 2024-08-17 13:24:12,528 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 13:24:15,209 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 13:24:16,709 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 13:24:26,770 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 13:24:28,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3329900.0, ans=0.1 2024-08-17 13:24:32,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1000, loss[loss=0.09381, beats_loss=0.0128, ecapa_loss=0.0001446, whisper_loss=0.07957, over 22375.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01058, ecapa_loss=0.0001486, whisper_loss=0.08924, over 3864352.76 frames. ], batch size: 92, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:24:44,859 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 13:25:09,568 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.260e-02 2024-08-17 13:25:28,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3330200.0, ans=0.0 2024-08-17 13:25:30,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3330200.0, ans=0.125 2024-08-17 13:25:31,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-08-17 13:25:32,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3330200.0, ans=0.2 2024-08-17 13:25:35,964 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 13:25:39,044 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 13:25:46,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3330300.0, ans=0.1 2024-08-17 13:25:52,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3330400.0, ans=0.125 2024-08-17 13:25:54,622 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-17 13:26:07,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1050, loss[loss=0.09634, beats_loss=0.009793, ecapa_loss=0.0001546, whisper_loss=0.085, over 14790.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01069, ecapa_loss=0.0001468, whisper_loss=0.08859, over 3867215.76 frames. ], batch size: 59, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:26:08,577 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 13:26:48,812 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 13:26:58,825 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.292e+01 2.531e+01 2.835e+01 4.393e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-17 13:27:18,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1100, loss[loss=0.1343, beats_loss=0.005628, ecapa_loss=0.0001984, whisper_loss=0.1266, over 20575.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01064, ecapa_loss=0.0001464, whisper_loss=0.08923, over 3857614.35 frames. ], batch size: 83, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:27:20,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3331000.0, ans=0.125 2024-08-17 13:27:21,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-17 13:27:21,686 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 13:27:34,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3331100.0, ans=0.125 2024-08-17 13:27:37,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3331100.0, ans=0.125 2024-08-17 13:27:52,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2024-08-17 13:27:59,191 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 13:28:01,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3331300.0, ans=0.2 2024-08-17 13:28:09,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-17 13:28:09,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-17 13:28:27,936 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-17 13:28:32,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1150, loss[loss=0.1164, beats_loss=0.01061, ecapa_loss=0.0001358, whisper_loss=0.1045, over 18637.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01064, ecapa_loss=0.0001475, whisper_loss=0.08898, over 3885786.96 frames. ], batch size: 73, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:28:47,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3331600.0, ans=0.2 2024-08-17 13:28:47,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3331600.0, ans=0.0 2024-08-17 13:29:07,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3331700.0, ans=0.0 2024-08-17 13:29:10,318 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 13:29:11,905 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 13:29:12,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3331700.0, ans=0.125 2024-08-17 13:29:24,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3331800.0, ans=0.1 2024-08-17 13:29:26,566 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.418e+01 2.645e+01 2.995e+01 9.791e+01, threshold=5.290e+01, percent-clipped=1.0 2024-08-17 13:29:34,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3331900.0, ans=0.0 2024-08-17 13:29:37,662 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 13:29:46,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332000.0, ans=0.1 2024-08-17 13:29:46,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1200, loss[loss=0.1035, beats_loss=0.0109, ecapa_loss=0.0001296, whisper_loss=0.09127, over 18787.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001477, whisper_loss=0.08926, over 3904073.03 frames. ], batch size: 73, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:29:53,697 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-17 13:30:18,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-17 13:30:23,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3332200.0, ans=0.0 2024-08-17 13:30:25,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2024-08-17 13:30:36,481 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 8 from Vox, 25 fro AS 2024-08-17 13:30:42,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3332300.0, ans=0.1 2024-08-17 13:30:44,951 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08344534784555435, model_norm_threshold=52.90373992919922 2024-08-17 13:30:45,119 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.conv_module2.depthwise_conv.causal_conv.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.267e+04, grad_sumsq=4.529e+05, orig_rms_sq=1.825e-01 2024-08-17 13:30:45,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3332400.0, ans=0.125 2024-08-17 13:30:46,493 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 13:30:57,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3332400.0, ans=0.125 2024-08-17 13:30:59,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3332400.0, ans=0.1 2024-08-17 13:31:01,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1250, loss[loss=0.1024, beats_loss=0.01091, ecapa_loss=0.0001245, whisper_loss=0.0902, over 22467.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01066, ecapa_loss=0.0001471, whisper_loss=0.08932, over 3908248.68 frames. ], batch size: 88, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:31:05,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3332500.0, ans=0.125 2024-08-17 13:31:07,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3332500.0, ans=0.0 2024-08-17 13:31:10,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-17 13:31:21,484 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-17 13:31:26,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3332600.0, ans=0.05 2024-08-17 13:31:29,747 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-17 13:31:30,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3332700.0, ans=0.0 2024-08-17 13:31:32,956 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.039e-02 2024-08-17 13:31:39,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3332700.0, ans=0.2 2024-08-17 13:31:54,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.269e+01 2.597e+01 2.986e+01 6.340e+02, threshold=5.193e+01, percent-clipped=3.0 2024-08-17 13:31:54,904 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 13:31:56,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3332800.0, ans=0.125 2024-08-17 13:32:03,259 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 13:32:09,664 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 13:32:11,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3332900.0, ans=0.0 2024-08-17 13:32:15,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1300, loss[loss=0.1041, beats_loss=0.008865, ecapa_loss=0.000115, whisper_loss=0.09408, over 15741.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01071, ecapa_loss=0.0001472, whisper_loss=0.0891, over 3894704.65 frames. ], batch size: 57, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:32:15,391 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 13:32:30,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3333100.0, ans=0.125 2024-08-17 13:32:37,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3333100.0, ans=0.125 2024-08-17 13:33:26,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3333400.0, ans=0.125 2024-08-17 13:33:34,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1350, loss[loss=0.1158, beats_loss=0.009387, ecapa_loss=0.0001761, whisper_loss=0.1047, over 22400.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01066, ecapa_loss=0.0001473, whisper_loss=0.08922, over 3888550.90 frames. ], batch size: 91, lr: 2.68e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:33:39,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3333500.0, ans=0.0 2024-08-17 13:33:51,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3333600.0, ans=0.125 2024-08-17 13:33:52,489 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 13:34:07,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3333700.0, ans=0.125 2024-08-17 13:34:24,725 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.323e+01 2.578e+01 2.886e+01 4.482e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 13:34:31,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3333900.0, ans=0.125 2024-08-17 13:34:41,603 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 13:34:45,547 WARNING [optim.py:496] (3/4) Scaling gradients by 0.028498075902462006, model_norm_threshold=51.5612907409668 2024-08-17 13:34:45,718 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.470e+05, grad_sumsq=7.470e+05, orig_rms_sq=1.000e+00 2024-08-17 13:34:45,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1400, loss[loss=0.09917, beats_loss=0.009459, ecapa_loss=0.0001508, whisper_loss=0.0882, over 20389.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.0001474, whisper_loss=0.08951, over 3899663.77 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:34:47,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3334000.0, ans=0.2 2024-08-17 13:35:11,129 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 13:35:34,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3334300.0, ans=15.0 2024-08-17 13:35:38,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3334300.0, ans=0.015 2024-08-17 13:35:46,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3334400.0, ans=0.125 2024-08-17 13:35:47,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-08-17 13:35:55,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3334400.0, ans=0.2 2024-08-17 13:35:57,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1450, loss[loss=0.0999, beats_loss=0.01265, ecapa_loss=0.0001255, whisper_loss=0.086, over 21790.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01076, ecapa_loss=0.000148, whisper_loss=0.08828, over 3882230.14 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:36:00,909 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:36:22,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2024-08-17 13:36:29,037 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 13:36:32,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3334700.0, ans=0.125 2024-08-17 13:36:49,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.311e+01 2.535e+01 2.723e+01 1.809e+03, threshold=5.071e+01, percent-clipped=1.0 2024-08-17 13:37:03,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.86 vs. limit=15.0 2024-08-17 13:37:09,762 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:37:10,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1500, loss[loss=0.09478, beats_loss=0.01369, ecapa_loss=0.0001012, whisper_loss=0.08007, over 22627.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01072, ecapa_loss=0.0001478, whisper_loss=0.08907, over 3898150.99 frames. ], batch size: 88, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:37:14,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3335000.0, ans=0.1 2024-08-17 13:37:47,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3335200.0, ans=0.1 2024-08-17 13:38:01,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3335300.0, ans=0.1 2024-08-17 13:38:09,723 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-17 13:38:19,160 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 39 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 13:38:24,113 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 13:38:26,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1550, loss[loss=0.1165, beats_loss=0.008068, ecapa_loss=0.0001886, whisper_loss=0.1065, over 21477.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001481, whisper_loss=0.09005, over 3893756.66 frames. ], batch size: 86, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:38:30,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3335500.0, ans=0.125 2024-08-17 13:38:53,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3335600.0, ans=0.0 2024-08-17 13:39:04,788 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 13:39:15,907 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 13:39:17,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3335800.0, ans=0.2 2024-08-17 13:39:20,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.357e+01 2.589e+01 2.903e+01 1.401e+02, threshold=5.177e+01, percent-clipped=4.0 2024-08-17 13:39:23,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3335800.0, ans=0.5 2024-08-17 13:39:40,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1600, loss[loss=0.08825, beats_loss=0.01371, ecapa_loss=0.0001224, whisper_loss=0.07331, over 22098.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.0001485, whisper_loss=0.0891, over 3882730.66 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:39:42,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3336000.0, ans=0.2 2024-08-17 13:39:46,576 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 13:39:57,488 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 13:40:08,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.26 vs. limit=6.0 2024-08-17 13:40:19,982 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 13:40:23,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3336200.0, ans=0.125 2024-08-17 13:40:26,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=12.0 2024-08-17 13:40:32,492 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-17 13:40:34,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3336300.0, ans=0.125 2024-08-17 13:40:40,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3336400.0, ans=0.125 2024-08-17 13:40:54,037 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 13:40:55,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1650, loss[loss=0.108, beats_loss=0.01242, ecapa_loss=0.0001432, whisper_loss=0.0941, over 22672.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.0001483, whisper_loss=0.08915, over 3855464.97 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:40:58,868 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-17 13:41:00,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2024-08-17 13:41:01,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3336500.0, ans=0.2 2024-08-17 13:41:02,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3336500.0, ans=0.05 2024-08-17 13:41:08,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3336500.0, ans=0.0 2024-08-17 13:41:10,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3336600.0, ans=0.125 2024-08-17 13:41:13,013 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.151e+00 2024-08-17 13:41:16,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3336600.0, ans=0.1 2024-08-17 13:41:16,940 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 13:41:24,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3336600.0, ans=0.125 2024-08-17 13:41:30,008 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 13:41:32,847 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 13:41:34,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3336700.0, ans=0.0 2024-08-17 13:41:37,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.49 vs. limit=10.0 2024-08-17 13:41:47,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.10 vs. limit=22.5 2024-08-17 13:41:50,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.288e+01 2.508e+01 2.798e+01 4.258e+01, threshold=5.017e+01, percent-clipped=0.0 2024-08-17 13:41:52,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-08-17 13:42:02,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3336900.0, ans=0.2 2024-08-17 13:42:03,847 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-17 13:42:11,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1700, loss[loss=0.08537, beats_loss=0.01304, ecapa_loss=0.0001188, whisper_loss=0.07114, over 18688.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01069, ecapa_loss=0.0001479, whisper_loss=0.09025, over 3901997.27 frames. ], batch size: 74, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:42:13,421 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06336042284965515, model_norm_threshold=50.16535949707031 2024-08-17 13:42:13,591 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.458e+04, grad_sumsq=5.458e+04, orig_rms_sq=1.000e+00 2024-08-17 13:42:15,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3337000.0, ans=0.0 2024-08-17 13:42:33,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3337100.0, ans=0.07 2024-08-17 13:42:33,899 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 13:42:36,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3337100.0, ans=0.125 2024-08-17 13:42:40,794 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-17 13:42:46,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3337200.0, ans=0.125 2024-08-17 13:43:00,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3337300.0, ans=0.2 2024-08-17 13:43:03,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-17 13:43:07,735 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 13:43:22,181 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-17 13:43:26,300 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1750, loss[loss=0.1242, beats_loss=0.009226, ecapa_loss=0.0001347, whisper_loss=0.1136, over 22743.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001488, whisper_loss=0.09034, over 3916576.82 frames. ], batch size: 88, lr: 2.67e-03, grad_scale: 1.152921504606847e+18 2024-08-17 13:43:28,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3337500.0, ans=0.0 2024-08-17 13:43:29,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3337500.0, ans=0.015 2024-08-17 13:43:31,069 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 13:43:42,898 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-17 13:43:52,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2024-08-17 13:44:22,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.382e+01 2.704e+01 3.004e+01 7.917e+02, threshold=5.409e+01, percent-clipped=2.0 2024-08-17 13:44:27,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3337900.0, ans=0.125 2024-08-17 13:44:43,192 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1800, loss[loss=0.09817, beats_loss=0.008806, ecapa_loss=0.0001347, whisper_loss=0.08802, over 14348.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001482, whisper_loss=0.09059, over 3929221.79 frames. ], batch size: 55, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:45:08,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-17 13:45:11,548 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 13:45:30,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3338300.0, ans=0.0 2024-08-17 13:45:58,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1850, loss[loss=0.0813, beats_loss=0.01252, ecapa_loss=0.0001666, whisper_loss=0.06712, over 21399.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001473, whisper_loss=0.09015, over 3937402.29 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:46:14,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3338600.0, ans=0.2 2024-08-17 13:46:16,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3338600.0, ans=0.0 2024-08-17 13:46:23,654 WARNING [optim.py:496] (3/4) Scaling gradients by 0.028131451457738876, model_norm_threshold=54.08749008178711 2024-08-17 13:46:23,827 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.044e+06, grad_sumsq=1.044e+06, orig_rms_sq=1.000e+00 2024-08-17 13:46:24,036 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-17 13:46:30,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3338700.0, ans=0.0 2024-08-17 13:46:30,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-08-17 13:46:33,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3338700.0, ans=0.2 2024-08-17 13:46:43,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2024-08-17 13:46:53,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.302e+01 2.591e+01 3.149e+01 1.923e+03, threshold=5.181e+01, percent-clipped=2.0 2024-08-17 13:46:55,023 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-17 13:46:57,935 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-17 13:47:02,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3338900.0, ans=0.125 2024-08-17 13:47:13,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1900, loss[loss=0.09938, beats_loss=0.0107, ecapa_loss=0.0001684, whisper_loss=0.087, over 21241.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01081, ecapa_loss=0.0001475, whisper_loss=0.08914, over 3890861.39 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:48:27,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2024-08-17 13:48:35,085 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 1950, loss[loss=0.1061, beats_loss=0.01159, ecapa_loss=0.0001457, whisper_loss=0.09309, over 22603.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01083, ecapa_loss=0.0001469, whisper_loss=0.08913, over 3902979.55 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:48:36,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3339500.0, ans=0.1 2024-08-17 13:48:47,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3339500.0, ans=0.125 2024-08-17 13:48:47,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3339500.0, ans=0.125 2024-08-17 13:48:51,498 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.155e+00 2024-08-17 13:48:54,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.43 vs. limit=10.0 2024-08-17 13:49:00,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3339600.0, ans=0.0 2024-08-17 13:49:00,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=12.0 2024-08-17 13:49:06,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3339700.0, ans=0.07 2024-08-17 13:49:15,023 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 13:49:19,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3339700.0, ans=0.125 2024-08-17 13:49:20,302 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 13:49:32,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3339800.0, ans=0.0 2024-08-17 13:49:32,885 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 13:49:35,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-17 13:49:36,197 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.407e+01 2.684e+01 3.018e+01 1.325e+02, threshold=5.369e+01, percent-clipped=1.0 2024-08-17 13:49:56,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2000, loss[loss=0.09989, beats_loss=0.009355, ecapa_loss=0.0001403, whisper_loss=0.08913, over 15819.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01072, ecapa_loss=0.0001471, whisper_loss=0.0895, over 3891215.39 frames. ], batch size: 61, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:50:02,826 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-17 13:50:06,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2024-08-17 13:50:08,501 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 13:50:13,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3340100.0, ans=0.2 2024-08-17 13:50:20,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3340100.0, ans=0.125 2024-08-17 13:50:32,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3340200.0, ans=0.1 2024-08-17 13:50:33,531 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 13:50:37,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2024-08-17 13:50:48,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-17 13:51:02,663 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 13:51:03,875 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 13:51:17,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2050, loss[loss=0.08673, beats_loss=0.01061, ecapa_loss=0.0001573, whisper_loss=0.07454, over 16160.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001469, whisper_loss=0.09006, over 3866807.19 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:52:16,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3340700.0, ans=10.0 2024-08-17 13:52:18,871 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 13:52:23,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3340700.0, ans=0.125 2024-08-17 13:52:40,623 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 13:52:41,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3340800.0, ans=0.125 2024-08-17 13:52:42,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.326e+01 2.537e+01 2.750e+01 3.562e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-17 13:52:56,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3340900.0, ans=0.0 2024-08-17 13:53:03,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2100, loss[loss=0.09642, beats_loss=0.01269, ecapa_loss=0.0001059, whisper_loss=0.08267, over 16888.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001466, whisper_loss=0.09021, over 3883295.51 frames. ], batch size: 63, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:53:13,629 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-17 13:53:29,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3341100.0, ans=0.0 2024-08-17 13:53:35,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3341200.0, ans=0.125 2024-08-17 13:53:44,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3341200.0, ans=0.0 2024-08-17 13:53:52,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3341300.0, ans=0.125 2024-08-17 13:54:02,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3341300.0, ans=0.0 2024-08-17 13:54:21,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2150, loss[loss=0.1001, beats_loss=0.01097, ecapa_loss=0.0001477, whisper_loss=0.08767, over 23197.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.09038, over 3874026.84 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:54:35,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3341600.0, ans=0.0 2024-08-17 13:54:41,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3341600.0, ans=0.1 2024-08-17 13:54:46,583 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 13:55:03,928 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 13:55:08,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.26 vs. limit=22.5 2024-08-17 13:55:09,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3341800.0, ans=0.0 2024-08-17 13:55:15,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3341800.0, ans=0.0 2024-08-17 13:55:16,220 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.587e+01 2.285e+01 2.479e+01 2.772e+01 4.505e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-17 13:55:33,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-17 13:55:35,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2200, loss[loss=0.1062, beats_loss=0.01059, ecapa_loss=0.0001591, whisper_loss=0.09397, over 21973.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001484, whisper_loss=0.09013, over 3884150.63 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:55:44,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.80 vs. limit=15.0 2024-08-17 13:56:16,868 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-17 13:56:39,954 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 13:56:41,131 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-17 13:56:44,121 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 13:56:44,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-17 13:56:47,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2250, loss[loss=0.1045, beats_loss=0.009609, ecapa_loss=0.0001743, whisper_loss=0.09315, over 22166.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001486, whisper_loss=0.09046, over 3897340.64 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:56:56,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3342500.0, ans=0.07 2024-08-17 13:56:59,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3342500.0, ans=0.2 2024-08-17 13:57:00,221 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-17 13:57:07,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3342600.0, ans=0.0 2024-08-17 13:57:14,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3342700.0, ans=0.025 2024-08-17 13:57:27,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3342800.0, ans=0.125 2024-08-17 13:57:36,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3342800.0, ans=0.1 2024-08-17 13:57:38,162 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.294e+01 2.494e+01 2.818e+01 3.738e+02, threshold=4.988e+01, percent-clipped=1.0 2024-08-17 13:57:46,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3342900.0, ans=0.125 2024-08-17 13:57:57,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2300, loss[loss=0.1127, beats_loss=0.009172, ecapa_loss=0.0001249, whisper_loss=0.1023, over 21849.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001485, whisper_loss=0.08999, over 3905090.85 frames. ], batch size: 83, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:58:02,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343000.0, ans=0.1 2024-08-17 13:58:05,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-17 13:58:09,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.95 vs. limit=22.5 2024-08-17 13:58:11,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3343100.0, ans=0.0 2024-08-17 13:58:26,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3343200.0, ans=0.125 2024-08-17 13:58:26,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.23 vs. limit=22.5 2024-08-17 13:58:30,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3343200.0, ans=0.125 2024-08-17 13:58:36,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3343200.0, ans=0.125 2024-08-17 13:58:40,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3343300.0, ans=0.125 2024-08-17 13:58:49,327 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 13:58:56,073 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 13:58:59,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3343400.0, ans=0.0 2024-08-17 13:59:04,464 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 13:59:08,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2350, loss[loss=0.1131, beats_loss=0.01237, ecapa_loss=0.000134, whisper_loss=0.09942, over 18577.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01071, ecapa_loss=0.0001483, whisper_loss=0.08971, over 3938615.40 frames. ], batch size: 73, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 13:59:30,276 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 13:59:31,757 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 13:59:43,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343700.0, ans=0.1 2024-08-17 13:59:48,647 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 13:59:54,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3343800.0, ans=0.1 2024-08-17 13:59:55,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=12.0 2024-08-17 13:59:57,054 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 14:00:00,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.33 vs. limit=10.0 2024-08-17 14:00:00,893 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.280e+01 2.527e+01 2.757e+01 5.363e+01, threshold=5.054e+01, percent-clipped=1.0 2024-08-17 14:00:07,947 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 14:00:09,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3343900.0, ans=0.125 2024-08-17 14:00:19,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2400, loss[loss=0.09132, beats_loss=0.008695, ecapa_loss=0.0001932, whisper_loss=0.08069, over 17801.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001489, whisper_loss=0.09021, over 3945348.58 frames. ], batch size: 72, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:00:26,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3344000.0, ans=0.125 2024-08-17 14:00:45,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3344200.0, ans=0.125 2024-08-17 14:01:04,383 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 14:01:17,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3344400.0, ans=0.1 2024-08-17 14:01:19,942 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 14:01:21,885 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 14:01:22,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3344400.0, ans=0.1 2024-08-17 14:01:22,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-08-17 14:01:25,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2450, loss[loss=0.08536, beats_loss=0.01321, ecapa_loss=0.0001495, whisper_loss=0.07065, over 17269.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001491, whisper_loss=0.08996, over 3892320.57 frames. ], batch size: 69, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:01:33,307 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 14:02:01,849 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-17 14:02:02,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3344700.0, ans=0.125 2024-08-17 14:02:09,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3344800.0, ans=0.0 2024-08-17 14:02:12,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=12.0 2024-08-17 14:02:16,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2024-08-17 14:02:16,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.328e+01 2.680e+01 2.976e+01 4.831e+01, threshold=5.360e+01, percent-clipped=0.0 2024-08-17 14:02:17,369 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 14:02:24,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3344900.0, ans=0.125 2024-08-17 14:02:29,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3344900.0, ans=0.0 2024-08-17 14:02:33,457 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:02:35,425 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2500, loss[loss=0.1043, beats_loss=0.01201, ecapa_loss=0.0001347, whisper_loss=0.09096, over 22484.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001478, whisper_loss=0.08973, over 3922741.20 frames. ], batch size: 89, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:02:39,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.68 vs. limit=15.0 2024-08-17 14:02:47,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-17 14:03:01,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3345200.0, ans=0.125 2024-08-17 14:03:18,080 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-17 14:03:19,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3345300.0, ans=0.125 2024-08-17 14:03:30,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3345400.0, ans=0.0 2024-08-17 14:03:43,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2550, loss[loss=0.08363, beats_loss=0.01319, ecapa_loss=0.0001153, whisper_loss=0.06928, over 20848.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01073, ecapa_loss=0.0001477, whisper_loss=0.0891, over 3876616.62 frames. ], batch size: 82, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:03:49,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.24 vs. limit=15.0 2024-08-17 14:04:19,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3345700.0, ans=0.0 2024-08-17 14:04:35,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.361e+01 2.576e+01 2.935e+01 4.539e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-17 14:04:36,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3345800.0, ans=0.125 2024-08-17 14:04:37,245 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 14:04:49,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3345900.0, ans=0.125 2024-08-17 14:04:54,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2600, loss[loss=0.1047, beats_loss=0.008303, ecapa_loss=0.0001417, whisper_loss=0.09499, over 15392.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.0001478, whisper_loss=0.09003, over 3922866.20 frames. ], batch size: 57, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:05:03,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3346000.0, ans=0.2 2024-08-17 14:05:16,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3346100.0, ans=0.125 2024-08-17 14:05:22,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=3346200.0, ans=0.02 2024-08-17 14:05:23,825 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-17 14:05:26,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3346200.0, ans=0.05 2024-08-17 14:06:02,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2650, loss[loss=0.1012, beats_loss=0.01097, ecapa_loss=0.0001616, whisper_loss=0.08859, over 22170.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.0001471, whisper_loss=0.09015, over 3918910.32 frames. ], batch size: 91, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:06:08,453 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.24 vs. limit=22.5 2024-08-17 14:06:14,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3346600.0, ans=0.0 2024-08-17 14:06:30,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3346700.0, ans=0.125 2024-08-17 14:06:52,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+01 2.232e+01 2.460e+01 2.794e+01 3.969e+01, threshold=4.921e+01, percent-clipped=0.0 2024-08-17 14:06:54,968 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 14:07:06,788 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 14:07:09,139 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3346900.0, ans=0.125 2024-08-17 14:07:13,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2700, loss[loss=0.09605, beats_loss=0.01284, ecapa_loss=0.0001397, whisper_loss=0.08182, over 16553.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001477, whisper_loss=0.09067, over 3916089.12 frames. ], batch size: 67, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:07:45,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-08-17 14:08:01,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.72 vs. limit=15.0 2024-08-17 14:08:11,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3347300.0, ans=0.0 2024-08-17 14:08:13,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.46 vs. limit=6.0 2024-08-17 14:08:16,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3347400.0, ans=0.0 2024-08-17 14:08:28,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3347400.0, ans=0.0 2024-08-17 14:08:30,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2750, loss[loss=0.1001, beats_loss=0.01171, ecapa_loss=0.0001029, whisper_loss=0.08737, over 18681.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001487, whisper_loss=0.09031, over 3919848.01 frames. ], batch size: 69, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:08:52,906 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 14:08:58,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3347700.0, ans=0.125 2024-08-17 14:09:00,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=8.0 2024-08-17 14:09:21,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2024-08-17 14:09:23,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.299e+01 2.509e+01 2.759e+01 4.042e+01, threshold=5.018e+01, percent-clipped=0.0 2024-08-17 14:09:27,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3347900.0, ans=0.2 2024-08-17 14:09:38,167 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 14:09:42,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2800, loss[loss=0.118, beats_loss=0.008982, ecapa_loss=0.0001326, whisper_loss=0.1077, over 24209.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001494, whisper_loss=0.09086, over 3903509.49 frames. ], batch size: 91, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:09:50,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3348000.0, ans=0.2 2024-08-17 14:09:56,346 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 14:10:08,087 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 14:10:08,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3348100.0, ans=0.2 2024-08-17 14:10:11,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3348100.0, ans=0.125 2024-08-17 14:10:13,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3348200.0, ans=0.0 2024-08-17 14:10:22,395 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 14:10:51,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-17 14:10:51,708 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 14:10:58,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2850, loss[loss=0.06006, beats_loss=0.01571, ecapa_loss=0.0001079, whisper_loss=0.04327, over 14379.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001488, whisper_loss=0.09049, over 3873097.90 frames. ], batch size: 58, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:11:06,733 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-17 14:11:14,817 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 14:11:15,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3348600.0, ans=0.0 2024-08-17 14:11:16,684 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 14:11:24,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-17 14:11:36,962 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:11:37,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3348700.0, ans=0.0 2024-08-17 14:11:56,343 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.404e+01 2.584e+01 2.839e+01 1.572e+02, threshold=5.168e+01, percent-clipped=1.0 2024-08-17 14:12:09,264 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 14:12:17,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2900, loss[loss=0.08596, beats_loss=0.01219, ecapa_loss=0.0001309, whisper_loss=0.07247, over 21362.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.000148, whisper_loss=0.09022, over 3906933.91 frames. ], batch size: 85, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:12:32,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3349100.0, ans=0.1 2024-08-17 14:12:49,722 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 14:12:58,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3349200.0, ans=0.125 2024-08-17 14:13:09,187 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 14:13:19,072 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 12 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 14:13:24,378 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 14:13:28,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 2950, loss[loss=0.09839, beats_loss=0.009811, ecapa_loss=0.0001705, whisper_loss=0.08688, over 22057.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09024, over 3906144.73 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:13:45,254 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 14:13:51,953 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 14:13:52,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3349600.0, ans=0.125 2024-08-17 14:14:01,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.03 vs. limit=22.5 2024-08-17 14:14:09,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3349800.0, ans=0.125 2024-08-17 14:14:15,802 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.281e+01 2.587e+01 2.940e+01 5.139e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-17 14:14:21,180 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-17 14:14:28,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3349900.0, ans=0.0 2024-08-17 14:14:32,941 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3000, loss[loss=0.09799, beats_loss=0.01245, ecapa_loss=0.0001301, whisper_loss=0.08424, over 23787.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001481, whisper_loss=0.09082, over 3926334.73 frames. ], batch size: 93, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:14:32,942 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 14:15:10,879 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005243, whisper_loss=0.2458, over 922467.00 frames. 2024-08-17 14:15:29,020 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004133, beats_loss=0, ecapa_loss=0.0004133, whisper_loss=0, over 939242.00 frames. 2024-08-17 14:17:06,336 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.6458, 3.4649, 3.0431, 3.3543], device='cuda:3') 2024-08-17 14:17:17,981 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 14:17:17,984 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 14:17:18,161 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 14:17:26,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3350000.0, ans=0.125 2024-08-17 14:17:42,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.95 vs. limit=22.5 2024-08-17 14:17:57,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3350200.0, ans=0.125 2024-08-17 14:18:08,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350300.0, ans=0.1 2024-08-17 14:18:09,835 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 14:18:26,899 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3050, loss[loss=0.0933, beats_loss=0.01115, ecapa_loss=0.0001314, whisper_loss=0.08084, over 17801.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001478, whisper_loss=0.0909, over 3944358.99 frames. ], batch size: 70, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:18:27,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3350500.0, ans=0.1 2024-08-17 14:18:36,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.05 vs. limit=10.0 2024-08-17 14:18:44,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3350600.0, ans=0.125 2024-08-17 14:18:52,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3350600.0, ans=0.0 2024-08-17 14:18:55,511 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-17 14:19:00,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3350700.0, ans=0.2 2024-08-17 14:19:14,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3350800.0, ans=0.125 2024-08-17 14:19:17,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.330e+01 2.532e+01 2.837e+01 4.705e+01, threshold=5.064e+01, percent-clipped=0.0 2024-08-17 14:19:23,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-08-17 14:19:34,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3100, loss[loss=0.1044, beats_loss=0.00946, ecapa_loss=0.000142, whisper_loss=0.09357, over 23566.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001482, whisper_loss=0.09114, over 3975330.52 frames. ], batch size: 95, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:19:51,752 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 14:19:52,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3351100.0, ans=0.0 2024-08-17 14:19:52,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3351100.0, ans=0.1 2024-08-17 14:19:53,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3351100.0, ans=0.0 2024-08-17 14:20:20,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=12.0 2024-08-17 14:20:21,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3351300.0, ans=0.125 2024-08-17 14:20:30,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3351400.0, ans=0.2 2024-08-17 14:20:37,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3150, loss[loss=0.08138, beats_loss=0.01425, ecapa_loss=0.0001369, whisper_loss=0.06576, over 18269.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01064, ecapa_loss=0.0001477, whisper_loss=0.09137, over 3964899.82 frames. ], batch size: 77, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:20:40,128 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.006e+00 2024-08-17 14:20:45,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3351500.0, ans=0.2 2024-08-17 14:20:58,236 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 9 from Vox, 35 fro AS 2024-08-17 14:21:09,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3351700.0, ans=0.0 2024-08-17 14:21:19,240 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-17 14:21:23,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.388e+01 2.692e+01 3.115e+01 4.630e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-17 14:21:31,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3351900.0, ans=0.125 2024-08-17 14:21:32,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3351900.0, ans=0.125 2024-08-17 14:21:39,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3200, loss[loss=0.09744, beats_loss=0.01006, ecapa_loss=0.0002019, whisper_loss=0.08535, over 21469.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01062, ecapa_loss=0.0001485, whisper_loss=0.09149, over 3967174.71 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:21:57,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3352100.0, ans=0.0 2024-08-17 14:22:05,612 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-17 14:22:10,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3352200.0, ans=0.125 2024-08-17 14:22:21,716 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-17 14:22:23,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3352300.0, ans=0.0 2024-08-17 14:22:24,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-17 14:22:29,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3352400.0, ans=0.125 2024-08-17 14:22:30,333 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-17 14:22:31,548 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 14:22:41,336 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3250, loss[loss=0.1273, beats_loss=0.009865, ecapa_loss=0.000171, whisper_loss=0.1157, over 22468.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01058, ecapa_loss=0.00015, whisper_loss=0.09174, over 3977304.20 frames. ], batch size: 87, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:22:57,615 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 14:23:02,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.16 vs. limit=22.5 2024-08-17 14:23:09,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3352700.0, ans=10.0 2024-08-17 14:23:17,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3352800.0, ans=0.0 2024-08-17 14:23:21,274 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 14:23:27,095 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.470e+01 2.720e+01 3.088e+01 1.006e+02, threshold=5.440e+01, percent-clipped=1.0 2024-08-17 14:23:30,574 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0725674256682396, model_norm_threshold=54.39925765991211 2024-08-17 14:23:30,734 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.092e+05, grad_sumsq=1.076e+07, orig_rms_sq=1.015e-02 2024-08-17 14:23:36,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.61 vs. limit=22.5 2024-08-17 14:23:40,289 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-17 14:23:43,724 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3300, loss[loss=0.1139, beats_loss=0.01033, ecapa_loss=0.0001678, whisper_loss=0.1019, over 23547.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0106, ecapa_loss=0.0001499, whisper_loss=0.09149, over 3970682.49 frames. ], batch size: 94, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:23:47,520 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 14:23:58,945 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 14:24:00,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-08-17 14:24:03,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3353100.0, ans=0.0 2024-08-17 14:24:20,917 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 14:24:26,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3353300.0, ans=0.125 2024-08-17 14:24:29,877 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 14:24:41,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3353400.0, ans=0.125 2024-08-17 14:24:45,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3350, loss[loss=0.1049, beats_loss=0.01294, ecapa_loss=0.0001206, whisper_loss=0.09079, over 23682.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001489, whisper_loss=0.09144, over 3942196.80 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:24:58,189 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 10 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-17 14:25:06,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3353600.0, ans=0.125 2024-08-17 14:25:07,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3353600.0, ans=0.0 2024-08-17 14:25:10,500 WARNING [optim.py:496] (3/4) Scaling gradients by 0.057717882096767426, model_norm_threshold=54.39925765991211 2024-08-17 14:25:10,667 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.125e+05, grad_sumsq=1.125e+05, orig_rms_sq=1.000e+00 2024-08-17 14:25:10,816 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-17 14:25:31,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.095e+01 2.365e+01 2.660e+01 2.915e+01 9.425e+02, threshold=5.320e+01, percent-clipped=5.0 2024-08-17 14:25:34,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3353900.0, ans=0.0 2024-08-17 14:25:37,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3353900.0, ans=0.1 2024-08-17 14:25:39,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3353900.0, ans=0.025 2024-08-17 14:25:43,915 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 14:25:47,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3400, loss[loss=0.1057, beats_loss=0.01025, ecapa_loss=0.0001634, whisper_loss=0.09384, over 22501.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01069, ecapa_loss=0.0001502, whisper_loss=0.09099, over 3926725.04 frames. ], batch size: 92, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:25:48,123 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.751e-03 2024-08-17 14:25:49,016 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 14:25:52,866 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-17 14:25:53,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=15.0 2024-08-17 14:25:59,040 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 14:26:05,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3354100.0, ans=0.125 2024-08-17 14:26:06,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3354100.0, ans=0.125 2024-08-17 14:26:08,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3354100.0, ans=0.0 2024-08-17 14:26:14,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-08-17 14:26:23,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.38 vs. limit=10.0 2024-08-17 14:26:46,487 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 14:26:50,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3450, loss[loss=0.09156, beats_loss=0.01196, ecapa_loss=0.0001697, whisper_loss=0.0779, over 19207.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001508, whisper_loss=0.09139, over 3944458.66 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:26:52,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2024-08-17 14:27:12,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.65 vs. limit=15.0 2024-08-17 14:27:13,683 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 14:27:24,095 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-17 14:27:24,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3354700.0, ans=0.05 2024-08-17 14:27:25,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3354700.0, ans=0.125 2024-08-17 14:27:32,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3354800.0, ans=0.125 2024-08-17 14:27:37,062 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.241e+01 2.529e+01 2.817e+01 6.166e+01, threshold=5.057e+01, percent-clipped=1.0 2024-08-17 14:27:46,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3354900.0, ans=0.1 2024-08-17 14:27:47,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3354900.0, ans=0.2 2024-08-17 14:27:49,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=12.0 2024-08-17 14:27:52,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3500, loss[loss=0.103, beats_loss=0.008923, ecapa_loss=0.0001299, whisper_loss=0.09279, over 14958.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001506, whisper_loss=0.09089, over 3929540.01 frames. ], batch size: 56, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:27:57,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3355000.0, ans=0.0 2024-08-17 14:28:00,617 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 14:28:02,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3355000.0, ans=0.0 2024-08-17 14:28:03,122 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.498e+00 2024-08-17 14:28:11,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3355100.0, ans=0.5 2024-08-17 14:28:15,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3355100.0, ans=0.125 2024-08-17 14:28:18,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3355200.0, ans=0.04949747468305833 2024-08-17 14:28:21,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3355200.0, ans=0.125 2024-08-17 14:28:32,758 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 14:28:33,937 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 14:28:41,468 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 14:28:44,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2024-08-17 14:28:48,540 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-17 14:28:52,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3355500.0, ans=0.2 2024-08-17 14:28:53,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3550, loss[loss=0.104, beats_loss=0.01036, ecapa_loss=0.0001481, whisper_loss=0.09213, over 19693.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01054, ecapa_loss=0.0001503, whisper_loss=0.09183, over 3912791.58 frames. ], batch size: 80, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:28:59,609 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-17 14:29:22,906 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 14:29:24,121 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 14:29:30,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3355800.0, ans=0.125 2024-08-17 14:29:38,580 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.236e+01 2.510e+01 2.747e+01 4.772e+01, threshold=5.021e+01, percent-clipped=0.0 2024-08-17 14:29:54,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3600, loss[loss=0.08598, beats_loss=0.01285, ecapa_loss=0.0001122, whisper_loss=0.07201, over 16501.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01064, ecapa_loss=0.0001491, whisper_loss=0.09184, over 3906291.46 frames. ], batch size: 64, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:29:55,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2024-08-17 14:30:16,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3356100.0, ans=0.0 2024-08-17 14:30:19,292 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 14:30:32,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-17 14:30:33,545 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 14:30:35,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3356300.0, ans=0.1 2024-08-17 14:30:44,779 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 14:30:55,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3356500.0, ans=0.0 2024-08-17 14:30:56,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3650, loss[loss=0.103, beats_loss=0.01115, ecapa_loss=0.0001693, whisper_loss=0.09014, over 15517.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01067, ecapa_loss=0.0001488, whisper_loss=0.09149, over 3879460.14 frames. ], batch size: 65, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:30:57,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3356500.0, ans=0.1 2024-08-17 14:31:05,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3356500.0, ans=0.1 2024-08-17 14:31:08,600 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-17 14:31:13,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-17 14:31:19,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3356700.0, ans=0.125 2024-08-17 14:31:27,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3356700.0, ans=0.2 2024-08-17 14:31:28,054 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 14:31:39,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3356800.0, ans=0.2 2024-08-17 14:31:40,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.623e+01 2.172e+01 2.480e+01 2.940e+01 4.712e+01, threshold=4.960e+01, percent-clipped=0.0 2024-08-17 14:31:46,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3356900.0, ans=0.125 2024-08-17 14:31:54,404 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 14:31:56,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3700, loss[loss=0.1013, beats_loss=0.01083, ecapa_loss=0.0001454, whisper_loss=0.08902, over 16696.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01071, ecapa_loss=0.0001499, whisper_loss=0.0906, over 3857375.77 frames. ], batch size: 65, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:32:14,436 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 31 from Vox, 26 fro AS 2024-08-17 14:32:27,978 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 19 from LS+wenet, 20 from Vox, 54 fro AS 2024-08-17 14:32:37,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3357300.0, ans=0.1 2024-08-17 14:32:39,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3357300.0, ans=0.125 2024-08-17 14:32:45,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3357400.0, ans=0.125 2024-08-17 14:32:50,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=12.0 2024-08-17 14:32:51,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3357400.0, ans=0.125 2024-08-17 14:32:54,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-17 14:32:55,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3357400.0, ans=0.0 2024-08-17 14:32:58,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-17 14:32:58,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3750, loss[loss=0.08458, beats_loss=0.01254, ecapa_loss=0.0001547, whisper_loss=0.0705, over 18540.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.000148, whisper_loss=0.09036, over 3837168.58 frames. ], batch size: 78, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:33:07,575 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 14:33:19,835 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 14:33:21,668 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.369e+01 2024-08-17 14:33:23,557 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-17 14:33:25,986 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 14:33:28,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3357700.0, ans=0.0 2024-08-17 14:33:44,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.285e+01 2.536e+01 2.853e+01 4.848e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-17 14:33:45,964 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 14:33:49,868 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 14:33:52,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3357900.0, ans=0.0 2024-08-17 14:33:53,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3357900.0, ans=0.0 2024-08-17 14:34:00,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3800, loss[loss=0.0979, beats_loss=0.01143, ecapa_loss=0.0001415, whisper_loss=0.08506, over 20291.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.0001477, whisper_loss=0.09087, over 3852861.77 frames. ], batch size: 81, lr: 2.67e-03, grad_scale: 1.152921504606847e+18 2024-08-17 14:34:05,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.12 vs. limit=10.0 2024-08-17 14:34:11,576 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 14:34:13,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3358100.0, ans=0.125 2024-08-17 14:34:18,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3358100.0, ans=0.2 2024-08-17 14:34:24,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.56 vs. limit=15.0 2024-08-17 14:34:28,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3358200.0, ans=0.0 2024-08-17 14:34:30,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3358200.0, ans=0.125 2024-08-17 14:34:35,668 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.47 vs. limit=22.5 2024-08-17 14:34:46,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.55 vs. limit=22.5 2024-08-17 14:34:56,142 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 14:35:02,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3850, loss[loss=0.1056, beats_loss=0.009225, ecapa_loss=0.0001546, whisper_loss=0.09483, over 22697.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.000147, whisper_loss=0.09078, over 3899639.87 frames. ], batch size: 90, lr: 2.67e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:35:15,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-17 14:35:27,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3358700.0, ans=0.1 2024-08-17 14:35:29,532 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 14:35:38,431 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 14:35:40,797 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 14:35:49,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.270e+01 2.483e+01 2.770e+01 3.754e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-17 14:35:58,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3358900.0, ans=0.2 2024-08-17 14:36:03,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3359000.0, ans=0.125 2024-08-17 14:36:03,809 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3900, loss[loss=0.101, beats_loss=0.01097, ecapa_loss=0.000123, whisper_loss=0.08876, over 20603.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.0001461, whisper_loss=0.09113, over 3931074.40 frames. ], batch size: 79, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:36:23,368 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 14:36:39,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3359300.0, ans=0.0 2024-08-17 14:36:46,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2024-08-17 14:36:47,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.93 vs. limit=22.5 2024-08-17 14:37:05,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3359500.0, ans=0.125 2024-08-17 14:37:05,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3359500.0, ans=0.1 2024-08-17 14:37:06,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 3950, loss[loss=0.09373, beats_loss=0.0116, ecapa_loss=0.0001182, whisper_loss=0.08095, over 18354.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01067, ecapa_loss=0.0001456, whisper_loss=0.09067, over 3892096.19 frames. ], batch size: 72, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:37:08,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3359500.0, ans=0.125 2024-08-17 14:37:10,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3359500.0, ans=0.04949747468305833 2024-08-17 14:37:17,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3359500.0, ans=0.125 2024-08-17 14:37:22,675 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-17 14:37:22,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3359600.0, ans=0.2 2024-08-17 14:37:37,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3359700.0, ans=0.0 2024-08-17 14:37:43,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3359700.0, ans=0.0 2024-08-17 14:37:57,418 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 31 from Vox, 28 fro AS 2024-08-17 14:37:58,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.357e+01 2.553e+01 2.882e+01 5.685e+01, threshold=5.106e+01, percent-clipped=1.0 2024-08-17 14:38:06,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3359900.0, ans=0.125 2024-08-17 14:38:11,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3359900.0, ans=0.125 2024-08-17 14:38:17,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.47 vs. limit=22.5 2024-08-17 14:38:18,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4000, loss[loss=0.1151, beats_loss=0.01088, ecapa_loss=0.0001297, whisper_loss=0.1029, over 21192.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01067, ecapa_loss=0.0001471, whisper_loss=0.09023, over 3884051.51 frames. ], batch size: 82, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:38:19,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3360000.0, ans=0.09899494936611666 2024-08-17 14:38:20,096 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-17 14:38:20,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3360000.0, ans=0.125 2024-08-17 14:38:22,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-08-17 14:38:24,449 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 14:38:30,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3360000.0, ans=0.0 2024-08-17 14:38:40,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.62 vs. limit=22.5 2024-08-17 14:38:46,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3360200.0, ans=0.125 2024-08-17 14:39:07,904 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 14:39:12,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3360300.0, ans=0.1 2024-08-17 14:39:26,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3360400.0, ans=0.0 2024-08-17 14:39:34,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4050, loss[loss=0.1017, beats_loss=0.009574, ecapa_loss=0.0001324, whisper_loss=0.0908, over 22368.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01069, ecapa_loss=0.0001481, whisper_loss=0.09043, over 3869760.40 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:39:46,405 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 14:39:53,333 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 14:40:01,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3360600.0, ans=0.125 2024-08-17 14:40:03,661 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 14:40:06,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=12.0 2024-08-17 14:40:13,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3360700.0, ans=0.125 2024-08-17 14:40:13,955 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 14:40:17,290 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-17 14:40:17,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.85 vs. limit=10.0 2024-08-17 14:40:29,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.266e+01 2.528e+01 2.779e+01 4.224e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 14:40:30,068 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 14:40:30,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3360800.0, ans=0.0 2024-08-17 14:40:30,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.17 vs. limit=15.0 2024-08-17 14:40:30,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.42 vs. limit=15.0 2024-08-17 14:40:38,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2024-08-17 14:40:39,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.17 vs. limit=15.0 2024-08-17 14:40:43,191 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 14:40:47,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4100, loss[loss=0.1093, beats_loss=0.009676, ecapa_loss=0.0001589, whisper_loss=0.098, over 18706.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01065, ecapa_loss=0.0001492, whisper_loss=0.09042, over 3863002.86 frames. ], batch size: 74, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:40:49,088 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 14:41:00,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3361000.0, ans=0.125 2024-08-17 14:41:10,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3361100.0, ans=0.1 2024-08-17 14:41:18,901 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-17 14:41:36,807 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 14:41:45,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3361300.0, ans=0.125 2024-08-17 14:42:02,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4150, loss[loss=0.09014, beats_loss=0.01316, ecapa_loss=0.0001333, whisper_loss=0.07564, over 13774.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001489, whisper_loss=0.09044, over 3865318.63 frames. ], batch size: 55, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:42:12,319 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-17 14:42:27,925 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-17 14:42:38,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3361700.0, ans=0.0 2024-08-17 14:42:41,351 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-17 14:42:56,844 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.341e+01 2.562e+01 2.816e+01 4.019e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-17 14:42:58,227 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 14:42:58,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3361900.0, ans=0.125 2024-08-17 14:43:13,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4200, loss[loss=0.1092, beats_loss=0.007539, ecapa_loss=0.0001673, whisper_loss=0.09995, over 19878.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001493, whisper_loss=0.08974, over 3843131.09 frames. ], batch size: 76, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:43:40,970 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 14:43:43,725 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 14:43:44,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3362200.0, ans=0.125 2024-08-17 14:43:52,830 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 14:44:01,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3362300.0, ans=0.125 2024-08-17 14:44:20,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3362400.0, ans=0.125 2024-08-17 14:44:23,948 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-17 14:44:26,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4250, loss[loss=0.08176, beats_loss=0.01224, ecapa_loss=0.0001602, whisper_loss=0.06792, over 21987.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001486, whisper_loss=0.08959, over 3859190.12 frames. ], batch size: 96, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:44:46,455 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 14:44:49,188 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06460781395435333, model_norm_threshold=51.23520278930664 2024-08-17 14:44:49,361 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.238e+04, grad_sumsq=7.238e+04, orig_rms_sq=1.000e+00 2024-08-17 14:44:50,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-08-17 14:45:04,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3362700.0, ans=0.0 2024-08-17 14:45:16,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3362800.0, ans=0.07 2024-08-17 14:45:22,304 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-17 14:45:23,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.395e+01 2.661e+01 3.176e+01 7.930e+02, threshold=5.321e+01, percent-clipped=4.0 2024-08-17 14:45:36,395 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-17 14:45:41,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4300, loss[loss=0.1011, beats_loss=0.0114, ecapa_loss=0.0001253, whisper_loss=0.08843, over 20752.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001501, whisper_loss=0.08966, over 3864245.70 frames. ], batch size: 84, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:45:46,264 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 14:45:55,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3363100.0, ans=0.2 2024-08-17 14:46:04,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3363100.0, ans=0.125 2024-08-17 14:46:08,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3363100.0, ans=0.125 2024-08-17 14:46:22,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3363200.0, ans=0.0 2024-08-17 14:46:30,668 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 14:46:38,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3363300.0, ans=0.1 2024-08-17 14:46:42,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3363400.0, ans=0.125 2024-08-17 14:46:47,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.30 vs. limit=6.0 2024-08-17 14:46:57,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4350, loss[loss=0.1141, beats_loss=0.009907, ecapa_loss=0.0001544, whisper_loss=0.1027, over 23937.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001482, whisper_loss=0.09016, over 3871485.41 frames. ], batch size: 92, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:47:08,616 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 13 from Vox, 19 fro AS 2024-08-17 14:47:20,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3363600.0, ans=0.125 2024-08-17 14:47:23,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3363600.0, ans=0.125 2024-08-17 14:47:46,123 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.54 vs. limit=10.0 2024-08-17 14:47:49,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3363800.0, ans=0.125 2024-08-17 14:47:53,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.243e+01 2.531e+01 2.839e+01 4.488e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-17 14:48:10,582 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4400, loss[loss=0.09538, beats_loss=0.01124, ecapa_loss=0.0001502, whisper_loss=0.08264, over 23051.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001466, whisper_loss=0.09082, over 3904622.21 frames. ], batch size: 96, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:48:48,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3364200.0, ans=0.125 2024-08-17 14:48:51,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-17 14:49:20,736 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 14:49:21,713 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4450, loss[loss=0.09618, beats_loss=0.0113, ecapa_loss=0.0001192, whisper_loss=0.08369, over 20133.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001456, whisper_loss=0.09084, over 3884975.50 frames. ], batch size: 78, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:49:26,380 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 14:49:37,702 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 14:49:42,896 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 14:49:46,821 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 14:50:01,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3364700.0, ans=0.02 2024-08-17 14:50:11,410 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 14:50:15,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3364800.0, ans=0.0 2024-08-17 14:50:16,740 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.265e+01 2.477e+01 2.746e+01 3.870e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-17 14:50:17,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:50:25,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3364900.0, ans=0.0 2024-08-17 14:50:28,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3364900.0, ans=0.125 2024-08-17 14:50:33,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4500, loss[loss=0.0978, beats_loss=0.006476, ecapa_loss=0.0001781, whisper_loss=0.08954, over 15784.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.0105, ecapa_loss=0.0001468, whisper_loss=0.09182, over 3903670.08 frames. ], batch size: 61, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:50:51,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3365100.0, ans=0.2 2024-08-17 14:50:52,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3365100.0, ans=0.125 2024-08-17 14:51:21,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3365300.0, ans=0.0 2024-08-17 14:51:34,252 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 14:51:37,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3365400.0, ans=0.125 2024-08-17 14:51:37,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3365400.0, ans=0.04949747468305833 2024-08-17 14:51:42,567 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4550, loss[loss=0.08971, beats_loss=0.01161, ecapa_loss=0.0001246, whisper_loss=0.07686, over 16254.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001465, whisper_loss=0.09139, over 3898756.49 frames. ], batch size: 64, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:51:46,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=15.0 2024-08-17 14:51:52,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3365500.0, ans=0.1 2024-08-17 14:51:53,871 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-17 14:51:58,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2024-08-17 14:51:59,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3365600.0, ans=0.0 2024-08-17 14:52:06,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3365600.0, ans=0.0 2024-08-17 14:52:17,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3365700.0, ans=0.125 2024-08-17 14:52:21,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3365800.0, ans=0.0 2024-08-17 14:52:31,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.242e+01 2.494e+01 2.758e+01 4.249e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-17 14:52:45,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3365900.0, ans=0.2 2024-08-17 14:52:46,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3366000.0, ans=0.125 2024-08-17 14:52:47,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4600, loss[loss=0.08628, beats_loss=0.0111, ecapa_loss=0.0001364, whisper_loss=0.07382, over 15951.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01058, ecapa_loss=0.0001466, whisper_loss=0.0916, over 3901565.08 frames. ], batch size: 65, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:53:00,835 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-17 14:53:08,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3366100.0, ans=0.0 2024-08-17 14:53:26,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3366300.0, ans=0.125 2024-08-17 14:53:29,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.37 vs. limit=22.5 2024-08-17 14:53:44,710 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-17 14:53:48,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4650, loss[loss=0.1048, beats_loss=0.01023, ecapa_loss=0.0001815, whisper_loss=0.0928, over 17150.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01052, ecapa_loss=0.0001475, whisper_loss=0.09222, over 3907947.08 frames. ], batch size: 72, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:53:57,343 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 14:53:57,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=12.0 2024-08-17 14:54:14,716 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 14:54:14,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3366700.0, ans=0.0 2024-08-17 14:54:19,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=15.0 2024-08-17 14:54:28,446 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-17 14:54:34,697 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 14:54:35,764 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.359e+01 2.591e+01 2.983e+01 4.791e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-17 14:54:37,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3366900.0, ans=0.0 2024-08-17 14:54:43,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3366900.0, ans=0.1 2024-08-17 14:54:51,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4700, loss[loss=0.07939, beats_loss=0.01208, ecapa_loss=0.0001263, whisper_loss=0.06605, over 14496.00 frames. ], tot_loss[loss=0.1043, beats_loss=0.0105, ecapa_loss=0.000148, whisper_loss=0.0923, over 3921232.57 frames. ], batch size: 57, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:54:55,181 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 14:54:56,389 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 14:55:08,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3367100.0, ans=0.125 2024-08-17 14:55:13,882 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 14:55:14,088 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 14:55:16,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3367200.0, ans=0.0 2024-08-17 14:55:22,621 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.314e-01 2024-08-17 14:55:37,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3367300.0, ans=0.0 2024-08-17 14:55:39,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3367300.0, ans=0.0 2024-08-17 14:55:48,585 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-17 14:55:53,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4750, loss[loss=0.09984, beats_loss=0.01011, ecapa_loss=0.0001437, whisper_loss=0.08829, over 22775.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.09206, over 3909503.96 frames. ], batch size: 91, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:55:58,265 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 9 from Vox, 27 fro AS 2024-08-17 14:56:31,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3367800.0, ans=0.125 2024-08-17 14:56:40,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.331e+01 2.583e+01 3.020e+01 4.522e+01, threshold=5.166e+01, percent-clipped=0.0 2024-08-17 14:56:43,990 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 14:56:53,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3367900.0, ans=0.1 2024-08-17 14:56:55,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4800, loss[loss=0.1148, beats_loss=0.009067, ecapa_loss=0.0001616, whisper_loss=0.1041, over 16044.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01042, ecapa_loss=0.0001486, whisper_loss=0.0922, over 3904612.31 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:57:03,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3368000.0, ans=0.2 2024-08-17 14:57:07,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3368100.0, ans=0.1 2024-08-17 14:57:13,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3368100.0, ans=0.125 2024-08-17 14:57:14,611 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09035546332597733, model_norm_threshold=51.66227722167969 2024-08-17 14:57:14,784 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.290e+04, grad_sumsq=4.290e+04, orig_rms_sq=1.000e+00 2024-08-17 14:57:40,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3368300.0, ans=0.0 2024-08-17 14:57:47,047 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-17 14:57:57,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4850, loss[loss=0.1019, beats_loss=0.01174, ecapa_loss=0.0001339, whisper_loss=0.08887, over 17543.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01044, ecapa_loss=0.0001504, whisper_loss=0.0918, over 3877784.05 frames. ], batch size: 69, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:58:09,035 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-17 14:58:09,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3368600.0, ans=0.2 2024-08-17 14:58:32,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3368700.0, ans=0.125 2024-08-17 14:58:36,825 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-17 14:58:38,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3368800.0, ans=0.04949747468305833 2024-08-17 14:58:44,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.372e+01 2.629e+01 2.886e+01 5.718e+02, threshold=5.259e+01, percent-clipped=2.0 2024-08-17 14:58:55,180 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-17 14:58:57,473 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 14:58:58,572 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4900, loss[loss=0.1112, beats_loss=0.01111, ecapa_loss=0.0001712, whisper_loss=0.09833, over 18717.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01061, ecapa_loss=0.000149, whisper_loss=0.09106, over 3881775.27 frames. ], batch size: 78, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 14:59:10,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3369100.0, ans=0.025 2024-08-17 14:59:15,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3369100.0, ans=0.125 2024-08-17 14:59:15,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-08-17 14:59:42,239 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-17 14:59:43,354 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 24 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-17 14:59:43,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3369300.0, ans=0.0 2024-08-17 14:59:49,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3369400.0, ans=0.125 2024-08-17 14:59:54,428 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 15:00:00,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 4950, loss[loss=0.09664, beats_loss=0.009885, ecapa_loss=0.0001385, whisper_loss=0.08537, over 19637.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001484, whisper_loss=0.09049, over 3886168.62 frames. ], batch size: 77, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:00:01,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3369500.0, ans=0.1 2024-08-17 15:00:03,349 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-17 15:00:09,705 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 15:00:18,120 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 15:00:26,820 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-17 15:00:41,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3369800.0, ans=0.2 2024-08-17 15:00:44,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3369800.0, ans=0.125 2024-08-17 15:00:47,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.355e+01 2.551e+01 2.775e+01 1.010e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-17 15:00:49,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-17 15:00:50,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3369900.0, ans=0.1 2024-08-17 15:00:52,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3369900.0, ans=0.09899494936611666 2024-08-17 15:00:54,020 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 15:00:56,502 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 15:00:59,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3369900.0, ans=0.125 2024-08-17 15:01:01,312 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 15:01:02,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5000, loss[loss=0.09921, beats_loss=0.01161, ecapa_loss=0.0001164, whisper_loss=0.08644, over 22563.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001486, whisper_loss=0.09011, over 3902435.31 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:01:17,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=3370100.0, ans=15.0 2024-08-17 15:01:27,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3370200.0, ans=0.1 2024-08-17 15:01:41,037 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 31 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 15:01:41,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3370300.0, ans=0.0 2024-08-17 15:01:46,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3370300.0, ans=0.1 2024-08-17 15:02:01,120 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-17 15:02:04,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5050, loss[loss=0.0973, beats_loss=0.009434, ecapa_loss=0.0001637, whisper_loss=0.08622, over 15009.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001488, whisper_loss=0.09081, over 3863728.43 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:02:16,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3370600.0, ans=0.0 2024-08-17 15:02:26,144 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 11 from Vox, 47 fro AS 2024-08-17 15:02:36,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3370700.0, ans=0.125 2024-08-17 15:02:52,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.426e+01 2.645e+01 3.149e+01 5.792e+01, threshold=5.289e+01, percent-clipped=1.0 2024-08-17 15:03:06,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5100, loss[loss=0.09194, beats_loss=0.008851, ecapa_loss=0.0001744, whisper_loss=0.08135, over 22667.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001496, whisper_loss=0.09107, over 3859855.69 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:03:08,043 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 15:03:15,329 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-17 15:03:18,903 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 15:03:24,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.88 vs. limit=15.0 2024-08-17 15:03:28,815 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 33 from Vox, 25 fro AS 2024-08-17 15:03:37,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3371200.0, ans=0.125 2024-08-17 15:03:41,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3371200.0, ans=0.125 2024-08-17 15:03:42,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3371300.0, ans=0.09899494936611666 2024-08-17 15:03:44,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3371300.0, ans=0.125 2024-08-17 15:03:53,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3371300.0, ans=0.125 2024-08-17 15:03:54,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3371400.0, ans=0.04949747468305833 2024-08-17 15:03:58,090 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 15:04:03,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.54 vs. limit=10.0 2024-08-17 15:04:08,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5150, loss[loss=0.09821, beats_loss=0.01059, ecapa_loss=0.0001516, whisper_loss=0.08611, over 13215.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01038, ecapa_loss=0.0001508, whisper_loss=0.09149, over 3859250.02 frames. ], batch size: 53, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:04:10,831 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 15:04:12,092 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 15:04:12,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3371500.0, ans=0.1 2024-08-17 15:04:14,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3371500.0, ans=0.025 2024-08-17 15:04:15,720 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 15:04:17,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3371500.0, ans=0.2 2024-08-17 15:04:28,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.48 vs. limit=15.0 2024-08-17 15:04:32,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2024-08-17 15:04:44,449 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 10 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 15:04:46,932 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 15:04:47,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=3371800.0, ans=0.02 2024-08-17 15:04:55,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.283e+01 2.471e+01 2.711e+01 4.570e+01, threshold=4.941e+01, percent-clipped=0.0 2024-08-17 15:05:01,753 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 15:05:03,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3371900.0, ans=0.125 2024-08-17 15:05:07,171 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.931e+00 2024-08-17 15:05:10,556 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5200, loss[loss=0.1311, beats_loss=0.009816, ecapa_loss=0.0001243, whisper_loss=0.1201, over 23559.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001497, whisper_loss=0.09134, over 3838377.37 frames. ], batch size: 88, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:05:11,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=3372000.0, ans=22.5 2024-08-17 15:05:11,844 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 15:05:25,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3372100.0, ans=0.015 2024-08-17 15:05:47,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3372300.0, ans=0.2 2024-08-17 15:06:12,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5250, loss[loss=0.1219, beats_loss=0.008988, ecapa_loss=0.0001372, whisper_loss=0.1115, over 18293.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001502, whisper_loss=0.09111, over 3823901.73 frames. ], batch size: 69, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:06:14,428 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-17 15:06:29,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3372600.0, ans=0.1 2024-08-17 15:06:31,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3372600.0, ans=0.1 2024-08-17 15:06:38,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3372700.0, ans=0.125 2024-08-17 15:06:53,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3372800.0, ans=0.125 2024-08-17 15:06:58,660 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-17 15:07:01,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.404e+01 2.586e+01 2.813e+01 5.298e+01, threshold=5.172e+01, percent-clipped=1.0 2024-08-17 15:07:12,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3372900.0, ans=0.125 2024-08-17 15:07:15,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5300, loss[loss=0.1477, beats_loss=0.007147, ecapa_loss=0.0001507, whisper_loss=0.139, over 22578.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001489, whisper_loss=0.09059, over 3814044.96 frames. ], batch size: 84, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:07:25,770 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 15:07:32,148 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-17 15:07:41,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3373200.0, ans=0.0 2024-08-17 15:08:02,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-17 15:08:09,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3373400.0, ans=0.0 2024-08-17 15:08:17,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5350, loss[loss=0.09261, beats_loss=0.01026, ecapa_loss=0.0001596, whisper_loss=0.08075, over 22821.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001488, whisper_loss=0.09023, over 3818068.09 frames. ], batch size: 95, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:08:30,241 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 15:08:52,718 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-17 15:09:05,091 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.366e+01 2.581e+01 2.898e+01 3.375e+02, threshold=5.162e+01, percent-clipped=2.0 2024-08-17 15:09:19,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3374000.0, ans=0.0 2024-08-17 15:09:20,107 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5400, loss[loss=0.0977, beats_loss=0.008059, ecapa_loss=0.000115, whisper_loss=0.08849, over 15351.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001479, whisper_loss=0.09058, over 3808148.12 frames. ], batch size: 55, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:09:23,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-17 15:09:29,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3374000.0, ans=0.0 2024-08-17 15:09:30,030 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 15:09:51,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-08-17 15:09:55,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3374200.0, ans=0.125 2024-08-17 15:10:08,033 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 15:10:08,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3374400.0, ans=0.0 2024-08-17 15:10:11,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.48 vs. limit=22.5 2024-08-17 15:10:17,858 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 15:10:21,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5450, loss[loss=0.09289, beats_loss=0.01025, ecapa_loss=0.000171, whisper_loss=0.08092, over 18321.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001481, whisper_loss=0.09045, over 3845513.60 frames. ], batch size: 69, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:10:31,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3374500.0, ans=0.1 2024-08-17 15:10:35,416 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 15:10:37,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-17 15:10:40,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2024-08-17 15:10:41,417 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 15:10:44,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3374600.0, ans=0.125 2024-08-17 15:11:08,522 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.416e+01 2.763e+01 3.086e+01 2.790e+02, threshold=5.526e+01, percent-clipped=2.0 2024-08-17 15:11:11,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3374900.0, ans=0.0 2024-08-17 15:11:20,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3374900.0, ans=0.125 2024-08-17 15:11:21,104 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 15:11:23,284 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5500, loss[loss=0.1008, beats_loss=0.01253, ecapa_loss=0.0001107, whisper_loss=0.0872, over 22313.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01073, ecapa_loss=0.0001471, whisper_loss=0.08972, over 3818917.23 frames. ], batch size: 86, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:11:26,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3375000.0, ans=10.0 2024-08-17 15:11:33,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3375000.0, ans=0.125 2024-08-17 15:11:34,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3375000.0, ans=0.1 2024-08-17 15:11:35,184 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 15:11:36,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3375100.0, ans=0.125 2024-08-17 15:11:48,687 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-17 15:11:55,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3375200.0, ans=0.0 2024-08-17 15:12:26,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5550, loss[loss=0.1115, beats_loss=0.009961, ecapa_loss=0.0001175, whisper_loss=0.1004, over 15448.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01066, ecapa_loss=0.000147, whisper_loss=0.09008, over 3783073.59 frames. ], batch size: 55, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:12:27,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3375500.0, ans=0.125 2024-08-17 15:12:48,725 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 15:12:54,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3375700.0, ans=0.125 2024-08-17 15:13:06,912 WARNING [optim.py:496] (3/4) Scaling gradients by 0.024026568979024887, model_norm_threshold=55.25676727294922 2024-08-17 15:13:07,084 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.047e+05, grad_sumsq=1.492e+05, orig_rms_sq=3.383e+00 2024-08-17 15:13:10,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2024-08-17 15:13:12,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=3375800.0, ans=0.1 2024-08-17 15:13:13,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.279e+01 2.545e+01 2.862e+01 2.300e+03, threshold=5.090e+01, percent-clipped=1.0 2024-08-17 15:13:13,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3375800.0, ans=0.1 2024-08-17 15:13:20,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3375900.0, ans=0.125 2024-08-17 15:13:28,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5600, loss[loss=0.1081, beats_loss=0.009911, ecapa_loss=0.0001494, whisper_loss=0.09668, over 16450.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001474, whisper_loss=0.09059, over 3826228.42 frames. ], batch size: 66, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:13:28,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3376000.0, ans=0.125 2024-08-17 15:13:39,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3376000.0, ans=0.125 2024-08-17 15:13:39,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3376000.0, ans=0.1 2024-08-17 15:13:42,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3376100.0, ans=0.125 2024-08-17 15:13:50,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3376100.0, ans=0.0 2024-08-17 15:13:59,786 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 15:14:17,134 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 15:14:18,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3376400.0, ans=0.125 2024-08-17 15:14:23,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3376400.0, ans=0.025 2024-08-17 15:14:29,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3376500.0, ans=0.125 2024-08-17 15:14:30,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5650, loss[loss=0.12, beats_loss=0.008827, ecapa_loss=0.0001512, whisper_loss=0.1097, over 21411.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.000147, whisper_loss=0.09012, over 3858302.55 frames. ], batch size: 84, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:14:50,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3376600.0, ans=0.0 2024-08-17 15:14:55,331 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 15:14:56,856 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-17 15:14:59,198 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 15:15:01,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=22.5 2024-08-17 15:15:09,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3376800.0, ans=0.2 2024-08-17 15:15:13,319 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 15:15:17,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.354e+01 2.556e+01 3.090e+01 4.828e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-17 15:15:20,548 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-17 15:15:24,122 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 15:15:24,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=22.5 2024-08-17 15:15:29,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3376900.0, ans=0.125 2024-08-17 15:15:32,713 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5700, loss[loss=0.1161, beats_loss=0.009075, ecapa_loss=0.0001402, whisper_loss=0.1056, over 21247.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01065, ecapa_loss=0.0001473, whisper_loss=0.08975, over 3834056.75 frames. ], batch size: 83, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:15:34,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=3377000.0, ans=0.05 2024-08-17 15:15:36,690 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 15:15:45,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3377100.0, ans=0.125 2024-08-17 15:15:55,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3377100.0, ans=0.07 2024-08-17 15:15:56,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3377200.0, ans=0.125 2024-08-17 15:16:06,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3377200.0, ans=0.125 2024-08-17 15:16:10,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3377300.0, ans=0.0 2024-08-17 15:16:20,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3377300.0, ans=0.125 2024-08-17 15:16:23,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3377400.0, ans=0.2 2024-08-17 15:16:30,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3377400.0, ans=0.1 2024-08-17 15:16:35,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5750, loss[loss=0.1118, beats_loss=0.009454, ecapa_loss=0.0001055, whisper_loss=0.1012, over 18992.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001485, whisper_loss=0.08963, over 3830530.86 frames. ], batch size: 68, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:16:39,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.77 vs. limit=10.0 2024-08-17 15:16:42,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3377500.0, ans=0.1 2024-08-17 15:16:50,445 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 15:17:22,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.322e+01 2.527e+01 2.770e+01 4.049e+01, threshold=5.054e+01, percent-clipped=0.0 2024-08-17 15:17:30,542 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 15:17:33,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3377900.0, ans=0.125 2024-08-17 15:17:34,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3377900.0, ans=0.0 2024-08-17 15:17:37,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5800, loss[loss=0.1015, beats_loss=0.009476, ecapa_loss=0.0001426, whisper_loss=0.09062, over 23142.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001486, whisper_loss=0.09036, over 3875477.82 frames. ], batch size: 93, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:17:39,026 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-17 15:17:43,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3378000.0, ans=0.125 2024-08-17 15:17:44,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3378000.0, ans=0.125 2024-08-17 15:17:48,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3378000.0, ans=0.125 2024-08-17 15:17:53,046 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 27 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-17 15:18:00,032 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.60 vs. limit=15.0 2024-08-17 15:18:01,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3378200.0, ans=0.125 2024-08-17 15:18:10,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3378200.0, ans=0.125 2024-08-17 15:18:10,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3378200.0, ans=0.02 2024-08-17 15:18:29,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.50 vs. limit=22.5 2024-08-17 15:18:38,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3378400.0, ans=0.125 2024-08-17 15:18:40,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5850, loss[loss=0.09457, beats_loss=0.0111, ecapa_loss=0.0001379, whisper_loss=0.08209, over 19217.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001474, whisper_loss=0.09039, over 3844750.43 frames. ], batch size: 75, lr: 2.66e-03, grad_scale: 1.152921504606847e+18 2024-08-17 15:19:06,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3378700.0, ans=0.125 2024-08-17 15:19:07,980 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2024-08-17 15:19:28,427 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.390e+01 2.604e+01 2.871e+01 4.556e+01, threshold=5.208e+01, percent-clipped=0.0 2024-08-17 15:19:40,180 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 15:19:42,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2024-08-17 15:19:43,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5900, loss[loss=0.1168, beats_loss=0.009108, ecapa_loss=0.0001293, whisper_loss=0.1064, over 16258.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001486, whisper_loss=0.09057, over 3854759.78 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 1.152921504606847e+18 2024-08-17 15:19:50,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3379000.0, ans=0.125 2024-08-17 15:19:56,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3379100.0, ans=0.125 2024-08-17 15:20:09,901 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 15:20:10,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3379200.0, ans=0.125 2024-08-17 15:20:20,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3379300.0, ans=0.125 2024-08-17 15:20:33,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3379400.0, ans=0.125 2024-08-17 15:20:34,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3379400.0, ans=0.125 2024-08-17 15:20:38,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3379400.0, ans=0.0 2024-08-17 15:20:45,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 5950, loss[loss=0.08635, beats_loss=0.01286, ecapa_loss=0.0001293, whisper_loss=0.0722, over 14605.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.09064, over 3863427.68 frames. ], batch size: 58, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:20:48,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3379500.0, ans=0.5 2024-08-17 15:20:51,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=12.0 2024-08-17 15:21:03,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.77 vs. limit=15.0 2024-08-17 15:21:05,112 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 15:21:18,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3379700.0, ans=0.125 2024-08-17 15:21:21,804 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 15:21:23,065 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 15:21:31,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3379800.0, ans=0.2 2024-08-17 15:21:34,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3379800.0, ans=0.2 2024-08-17 15:21:35,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.244e+01 2.483e+01 2.977e+01 4.324e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-17 15:21:49,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6000, loss[loss=0.09735, beats_loss=0.008427, ecapa_loss=0.0001736, whisper_loss=0.08719, over 20922.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.09012, over 3874364.64 frames. ], batch size: 84, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:21:49,214 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 15:22:23,042 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2521, beats_loss=0, ecapa_loss=0.0005335, whisper_loss=0.2467, over 922467.00 frames. 2024-08-17 15:22:37,668 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004145, beats_loss=0, ecapa_loss=0.0004145, whisper_loss=0, over 939242.00 frames. 2024-08-17 15:24:12,948 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02339, beats_loss=0.02339, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 15:24:12,952 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 15:24:29,711 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.16 vs. limit=6.0 2024-08-17 15:25:12,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3380400.0, ans=0.125 2024-08-17 15:25:19,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6050, loss[loss=0.08428, beats_loss=0.008831, ecapa_loss=0.000202, whisper_loss=0.07343, over 13102.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001485, whisper_loss=0.08963, over 3848023.70 frames. ], batch size: 54, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:25:27,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3380500.0, ans=0.125 2024-08-17 15:25:30,623 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 15:25:32,266 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-17 15:25:41,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.81 vs. limit=22.5 2024-08-17 15:25:50,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3380700.0, ans=0.125 2024-08-17 15:25:59,570 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 15:26:13,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.302e+01 2.473e+01 2.788e+01 3.926e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-17 15:26:18,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3380900.0, ans=0.0 2024-08-17 15:26:23,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3380900.0, ans=0.1 2024-08-17 15:26:24,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3380900.0, ans=0.125 2024-08-17 15:26:28,378 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6100, loss[loss=0.1339, beats_loss=0.006673, ecapa_loss=0.0001502, whisper_loss=0.1257, over 23335.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001481, whisper_loss=0.08998, over 3857008.98 frames. ], batch size: 86, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:26:43,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3381100.0, ans=0.0 2024-08-17 15:26:56,312 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 15:27:03,397 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-17 15:27:15,320 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 28 from Vox, 23 fro AS 2024-08-17 15:27:27,508 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-17 15:27:36,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6150, loss[loss=0.07849, beats_loss=0.01145, ecapa_loss=0.0001667, whisper_loss=0.06537, over 13854.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0106, ecapa_loss=0.0001485, whisper_loss=0.08924, over 3846839.45 frames. ], batch size: 59, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:27:37,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3381500.0, ans=0.125 2024-08-17 15:27:38,869 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 15:27:46,876 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-17 15:28:18,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3381700.0, ans=0.025 2024-08-17 15:28:34,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3381800.0, ans=0.125 2024-08-17 15:28:45,127 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.233e+01 2.467e+01 2.726e+01 4.750e+01, threshold=4.934e+01, percent-clipped=0.0 2024-08-17 15:28:50,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3381900.0, ans=0.2 2024-08-17 15:28:54,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3381900.0, ans=0.1 2024-08-17 15:28:59,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3381900.0, ans=0.0 2024-08-17 15:29:04,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6200, loss[loss=0.1083, beats_loss=0.009954, ecapa_loss=0.0001467, whisper_loss=0.09687, over 22416.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001487, whisper_loss=0.08909, over 3874761.16 frames. ], batch size: 90, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:29:19,851 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 15:29:20,621 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.361e+01 2024-08-17 15:29:32,678 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 15:29:42,614 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-17 15:30:26,957 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 15:30:37,518 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6250, loss[loss=0.1072, beats_loss=0.009212, ecapa_loss=0.0001864, whisper_loss=0.09617, over 20646.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0106, ecapa_loss=0.0001492, whisper_loss=0.08907, over 3852913.94 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:30:40,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3382500.0, ans=0.0 2024-08-17 15:30:57,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3382600.0, ans=0.125 2024-08-17 15:30:58,307 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.90 vs. limit=5.0 2024-08-17 15:31:01,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3382600.0, ans=0.0 2024-08-17 15:31:13,797 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 15:31:53,981 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.680e+01 2.369e+01 2.606e+01 2.979e+01 4.813e+02, threshold=5.211e+01, percent-clipped=2.0 2024-08-17 15:32:16,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6300, loss[loss=0.1064, beats_loss=0.009076, ecapa_loss=0.0001518, whisper_loss=0.09576, over 15891.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001498, whisper_loss=0.08939, over 3839175.43 frames. ], batch size: 62, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:32:24,768 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-17 15:32:38,451 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.552e+00 2024-08-17 15:32:58,898 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 15:33:03,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3383200.0, ans=0.125 2024-08-17 15:33:20,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383300.0, ans=0.1 2024-08-17 15:33:26,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3383300.0, ans=0.1 2024-08-17 15:33:51,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6350, loss[loss=0.08061, beats_loss=0.01193, ecapa_loss=0.0001296, whisper_loss=0.06739, over 16442.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001512, whisper_loss=0.08907, over 3840728.16 frames. ], batch size: 70, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:34:06,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2024-08-17 15:34:12,617 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-17 15:34:22,688 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 15:34:30,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3383700.0, ans=0.125 2024-08-17 15:34:32,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-17 15:34:45,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3383800.0, ans=0.04949747468305833 2024-08-17 15:34:50,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3383800.0, ans=0.1 2024-08-17 15:34:50,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2024-08-17 15:34:51,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.290e+01 2.555e+01 2.760e+01 4.440e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-17 15:34:58,119 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 15:35:07,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6400, loss[loss=0.08429, beats_loss=0.01143, ecapa_loss=0.0001595, whisper_loss=0.07127, over 21290.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001515, whisper_loss=0.08961, over 3841600.35 frames. ], batch size: 87, lr: 2.66e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:35:31,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2024-08-17 15:35:32,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3384100.0, ans=0.125 2024-08-17 15:35:39,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3384100.0, ans=0.125 2024-08-17 15:36:10,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3384300.0, ans=0.0 2024-08-17 15:36:15,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3384400.0, ans=0.125 2024-08-17 15:36:31,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6450, loss[loss=0.1091, beats_loss=0.01054, ecapa_loss=0.0001352, whisper_loss=0.09717, over 22700.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01062, ecapa_loss=0.0001499, whisper_loss=0.08889, over 3842867.49 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:36:51,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3384600.0, ans=0.05 2024-08-17 15:36:58,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3384600.0, ans=0.0 2024-08-17 15:37:07,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.0 2024-08-17 15:37:09,968 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-17 15:37:18,195 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 15:37:31,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3384800.0, ans=0.2 2024-08-17 15:37:37,332 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.419e+01 2.570e+01 2.770e+01 6.732e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-17 15:37:51,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3384900.0, ans=0.125 2024-08-17 15:37:54,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3384900.0, ans=0.125 2024-08-17 15:37:56,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6500, loss[loss=0.1167, beats_loss=0.01042, ecapa_loss=0.0001627, whisper_loss=0.1046, over 23048.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01056, ecapa_loss=0.0001498, whisper_loss=0.08947, over 3852264.50 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:38:18,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.49 vs. limit=6.0 2024-08-17 15:38:22,257 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 15:39:09,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3385400.0, ans=0.125 2024-08-17 15:39:13,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3385400.0, ans=0.0 2024-08-17 15:39:23,384 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6550, loss[loss=0.104, beats_loss=0.01069, ecapa_loss=0.000155, whisper_loss=0.09173, over 21540.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001491, whisper_loss=0.0895, over 3868713.88 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:39:25,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3385500.0, ans=0.0 2024-08-17 15:39:37,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3385600.0, ans=0.125 2024-08-17 15:40:07,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3385800.0, ans=0.1 2024-08-17 15:40:11,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3385800.0, ans=0.2 2024-08-17 15:40:27,180 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.341e+01 2.572e+01 2.842e+01 4.504e+01, threshold=5.144e+01, percent-clipped=0.0 2024-08-17 15:40:34,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3385900.0, ans=0.125 2024-08-17 15:40:48,018 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6600, loss[loss=0.1007, beats_loss=0.007473, ecapa_loss=0.0001827, whisper_loss=0.09142, over 21164.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01064, ecapa_loss=0.0001489, whisper_loss=0.08893, over 3864183.55 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 15:40:53,543 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 15:40:59,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3386000.0, ans=0.125 2024-08-17 15:41:15,779 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 15:41:16,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2024-08-17 15:41:37,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-17 15:41:38,083 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=22.5 2024-08-17 15:41:40,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3386200.0, ans=0.125 2024-08-17 15:41:44,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3386300.0, ans=0.125 2024-08-17 15:41:45,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-08-17 15:42:02,326 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 15:42:13,224 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 15:42:16,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3386400.0, ans=0.125 2024-08-17 15:42:25,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3386500.0, ans=0.0 2024-08-17 15:42:26,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6650, loss[loss=0.07255, beats_loss=0.01195, ecapa_loss=0.0001691, whisper_loss=0.05891, over 20808.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01067, ecapa_loss=0.0001486, whisper_loss=0.08917, over 3885502.52 frames. ], batch size: 92, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:42:51,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3386600.0, ans=0.125 2024-08-17 15:43:00,467 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 15:43:00,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3386600.0, ans=0.2 2024-08-17 15:43:25,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.44 vs. limit=15.0 2024-08-17 15:43:30,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3386800.0, ans=0.0 2024-08-17 15:43:33,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-17 15:43:41,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3386800.0, ans=0.125 2024-08-17 15:43:47,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.309e+01 2.525e+01 2.864e+01 3.753e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-17 15:43:49,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3386900.0, ans=0.0 2024-08-17 15:44:04,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6700, loss[loss=0.09394, beats_loss=0.01018, ecapa_loss=0.0001497, whisper_loss=0.08226, over 19798.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01064, ecapa_loss=0.0001485, whisper_loss=0.08907, over 3862180.89 frames. ], batch size: 84, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:44:19,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3387100.0, ans=0.05 2024-08-17 15:44:23,677 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 15:44:25,579 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 15:44:25,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3387100.0, ans=0.125 2024-08-17 15:44:28,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3387100.0, ans=0.1 2024-08-17 15:44:38,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 15:44:40,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3387200.0, ans=0.125 2024-08-17 15:44:56,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.65 vs. limit=15.0 2024-08-17 15:44:57,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3387300.0, ans=0.2 2024-08-17 15:45:06,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3387300.0, ans=0.0 2024-08-17 15:45:06,859 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.46 vs. limit=15.0 2024-08-17 15:45:18,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3387400.0, ans=0.0 2024-08-17 15:45:34,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6750, loss[loss=0.1022, beats_loss=0.01169, ecapa_loss=0.0001486, whisper_loss=0.089, over 21503.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01068, ecapa_loss=0.0001472, whisper_loss=0.08929, over 3900420.00 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:45:38,627 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 26 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-17 15:45:46,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3387500.0, ans=0.2 2024-08-17 15:45:47,489 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-17 15:46:06,065 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 15:46:09,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3387600.0, ans=0.0 2024-08-17 15:46:16,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3387700.0, ans=0.0 2024-08-17 15:46:18,123 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 15:46:36,895 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 15:46:54,527 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.393e+01 2.641e+01 2.928e+01 4.492e+01, threshold=5.282e+01, percent-clipped=0.0 2024-08-17 15:46:56,136 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 15:47:02,524 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 15:47:14,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6800, loss[loss=0.1118, beats_loss=0.009597, ecapa_loss=0.0001724, whisper_loss=0.1004, over 22575.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001484, whisper_loss=0.08993, over 3920312.60 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:47:18,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3388000.0, ans=0.125 2024-08-17 15:47:27,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3388000.0, ans=0.0 2024-08-17 15:48:01,212 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.019e-02 2024-08-17 15:48:02,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3388200.0, ans=0.2 2024-08-17 15:48:07,948 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-17 15:48:48,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3388400.0, ans=10.0 2024-08-17 15:48:53,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6850, loss[loss=0.08524, beats_loss=0.009219, ecapa_loss=0.0001698, whisper_loss=0.07432, over 15581.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001491, whisper_loss=0.09032, over 3897029.90 frames. ], batch size: 59, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:49:24,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.94 vs. limit=10.0 2024-08-17 15:49:29,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3388700.0, ans=0.0 2024-08-17 15:49:46,079 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-17 15:49:47,989 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 15:49:48,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3388800.0, ans=0.2 2024-08-17 15:49:58,598 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 15:50:06,470 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.332e+01 2.534e+01 2.809e+01 1.632e+02, threshold=5.068e+01, percent-clipped=1.0 2024-08-17 15:50:27,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6900, loss[loss=0.09172, beats_loss=0.01246, ecapa_loss=0.0001308, whisper_loss=0.07795, over 15334.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001505, whisper_loss=0.09062, over 3910452.61 frames. ], batch size: 60, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:50:36,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3389000.0, ans=0.0 2024-08-17 15:51:33,433 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 15:51:39,482 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 15:51:45,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3389300.0, ans=0.0 2024-08-17 15:52:08,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 6950, loss[loss=0.07996, beats_loss=0.01136, ecapa_loss=0.0001689, whisper_loss=0.06691, over 18559.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001502, whisper_loss=0.09077, over 3911836.16 frames. ], batch size: 79, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:52:13,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3389500.0, ans=0.125 2024-08-17 15:52:19,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3389500.0, ans=0.0 2024-08-17 15:52:29,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3389600.0, ans=0.125 2024-08-17 15:52:49,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3389700.0, ans=0.125 2024-08-17 15:53:06,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.84 vs. limit=10.0 2024-08-17 15:53:24,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.373e+01 2.584e+01 2.825e+01 4.780e+01, threshold=5.168e+01, percent-clipped=0.0 2024-08-17 15:53:41,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7000, loss[loss=0.09736, beats_loss=0.01217, ecapa_loss=0.0001255, whisper_loss=0.08393, over 21580.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001493, whisper_loss=0.09157, over 3918564.32 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:53:49,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3390000.0, ans=0.0 2024-08-17 15:53:52,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390000.0, ans=0.1 2024-08-17 15:54:25,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.75 vs. limit=22.5 2024-08-17 15:54:39,270 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 15:54:49,821 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 32 from Vox, 32 fro AS 2024-08-17 15:54:56,855 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-17 15:55:06,703 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7050, loss[loss=0.111, beats_loss=0.006536, ecapa_loss=0.0001482, whisper_loss=0.103, over 17400.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001492, whisper_loss=0.09147, over 3888105.49 frames. ], batch size: 68, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:55:13,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3390500.0, ans=0.05 2024-08-17 15:55:43,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390700.0, ans=0.1 2024-08-17 15:55:52,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3390700.0, ans=0.125 2024-08-17 15:55:55,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2024-08-17 15:56:03,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3390800.0, ans=0.04949747468305833 2024-08-17 15:56:05,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3390800.0, ans=0.05 2024-08-17 15:56:07,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3390800.0, ans=0.04949747468305833 2024-08-17 15:56:16,216 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.336e+01 2.552e+01 2.787e+01 4.135e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-17 15:56:17,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3390900.0, ans=0.1 2024-08-17 15:56:18,507 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 15:56:25,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3390900.0, ans=0.0 2024-08-17 15:56:32,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7100, loss[loss=0.1005, beats_loss=0.01078, ecapa_loss=0.0001366, whisper_loss=0.0884, over 15059.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001487, whisper_loss=0.09133, over 3844882.63 frames. ], batch size: 58, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:57:08,760 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-08-17 15:57:10,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3391200.0, ans=0.125 2024-08-17 15:57:15,650 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 15:57:22,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3391200.0, ans=0.125 2024-08-17 15:57:33,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3391300.0, ans=0.0 2024-08-17 15:57:44,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3391400.0, ans=0.125 2024-08-17 15:57:55,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3391400.0, ans=0.125 2024-08-17 15:57:59,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7150, loss[loss=0.07201, beats_loss=0.01006, ecapa_loss=0.0001837, whisper_loss=0.06011, over 15803.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001495, whisper_loss=0.09161, over 3870541.40 frames. ], batch size: 67, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:58:13,829 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-17 15:58:24,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3391600.0, ans=0.2 2024-08-17 15:58:32,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3391700.0, ans=0.125 2024-08-17 15:58:49,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3391800.0, ans=0.125 2024-08-17 15:58:50,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3391800.0, ans=0.0 2024-08-17 15:59:04,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.298e+01 2.545e+01 2.788e+01 4.387e+02, threshold=5.090e+01, percent-clipped=2.0 2024-08-17 15:59:08,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3391900.0, ans=0.1 2024-08-17 15:59:11,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2024-08-17 15:59:20,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7200, loss[loss=0.1155, beats_loss=0.009711, ecapa_loss=0.0001565, whisper_loss=0.1042, over 21947.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001493, whisper_loss=0.09085, over 3905461.19 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 15:59:34,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3392100.0, ans=0.0 2024-08-17 15:59:37,174 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 15:59:47,125 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 15:59:52,057 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-17 16:00:02,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2024-08-17 16:00:09,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3392300.0, ans=0.125 2024-08-17 16:00:21,457 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 28 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-17 16:00:27,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-17 16:00:40,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3392500.0, ans=0.0 2024-08-17 16:00:41,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7250, loss[loss=0.09842, beats_loss=0.01074, ecapa_loss=0.0001445, whisper_loss=0.08623, over 19228.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001497, whisper_loss=0.09078, over 3895712.24 frames. ], batch size: 75, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:00:49,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3392500.0, ans=0.1 2024-08-17 16:01:01,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3392600.0, ans=0.125 2024-08-17 16:01:02,317 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-17 16:01:15,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3392700.0, ans=0.0 2024-08-17 16:01:16,900 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 16:01:21,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3392700.0, ans=0.05 2024-08-17 16:01:23,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3392700.0, ans=0.1 2024-08-17 16:01:27,964 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 16:01:39,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3392800.0, ans=0.1 2024-08-17 16:01:47,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.301e+01 2.611e+01 2.944e+01 3.835e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 16:02:00,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3392900.0, ans=0.125 2024-08-17 16:02:05,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7300, loss[loss=0.107, beats_loss=0.008331, ecapa_loss=0.0001331, whisper_loss=0.09733, over 15481.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001493, whisper_loss=0.0908, over 3885015.11 frames. ], batch size: 57, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:02:05,431 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 16:02:23,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2024-08-17 16:02:37,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3393200.0, ans=0.05 2024-08-17 16:02:38,887 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 16:02:57,811 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.69 vs. limit=15.0 2024-08-17 16:03:00,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3393300.0, ans=0.1 2024-08-17 16:03:03,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3393300.0, ans=0.125 2024-08-17 16:03:06,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2024-08-17 16:03:24,833 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 16:03:27,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7350, loss[loss=0.1139, beats_loss=0.009854, ecapa_loss=0.0001314, whisper_loss=0.1027, over 19178.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001492, whisper_loss=0.09106, over 3917520.04 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:03:41,474 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 16:03:43,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3393600.0, ans=0.0 2024-08-17 16:03:48,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3393600.0, ans=0.125 2024-08-17 16:03:52,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=15.0 2024-08-17 16:03:52,898 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 22 from Vox, 17 fro AS 2024-08-17 16:03:54,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3393600.0, ans=0.04949747468305833 2024-08-17 16:03:54,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3393600.0, ans=0.125 2024-08-17 16:04:06,505 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 16:04:11,515 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-17 16:04:23,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3393800.0, ans=0.05 2024-08-17 16:04:31,307 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.343e+01 2.608e+01 3.088e+01 3.317e+02, threshold=5.216e+01, percent-clipped=4.0 2024-08-17 16:04:31,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3393900.0, ans=0.2 2024-08-17 16:04:45,329 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 16:04:46,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7400, loss[loss=0.1006, beats_loss=0.009585, ecapa_loss=0.0001341, whisper_loss=0.08963, over 17778.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001499, whisper_loss=0.09085, over 3885844.43 frames. ], batch size: 67, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:04:48,334 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 16:05:12,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3394100.0, ans=0.95 2024-08-17 16:05:20,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3394200.0, ans=0.1 2024-08-17 16:05:26,462 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 16:05:30,057 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-17 16:05:34,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.45 vs. limit=15.0 2024-08-17 16:05:51,278 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 16:06:10,452 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 16:06:12,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7450, loss[loss=0.1133, beats_loss=0.01107, ecapa_loss=0.0001484, whisper_loss=0.1007, over 23853.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001499, whisper_loss=0.09097, over 3868349.91 frames. ], batch size: 93, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:06:23,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3394500.0, ans=0.125 2024-08-17 16:06:26,840 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 16:06:39,230 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 16:06:39,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=12.0 2024-08-17 16:06:43,016 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.842e+00 2024-08-17 16:06:58,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-08-17 16:07:02,627 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 16:07:14,125 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 16:07:14,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3394800.0, ans=0.125 2024-08-17 16:07:22,055 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.262e+01 2.481e+01 2.703e+01 3.740e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-17 16:07:27,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3394900.0, ans=0.125 2024-08-17 16:07:30,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3394900.0, ans=0.1 2024-08-17 16:07:38,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7500, loss[loss=0.1249, beats_loss=0.005906, ecapa_loss=0.0001898, whisper_loss=0.1171, over 19051.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01047, ecapa_loss=0.0001512, whisper_loss=0.09146, over 3888914.31 frames. ], batch size: 75, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:07:46,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=12.0 2024-08-17 16:07:50,229 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-17 16:07:50,777 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.37 vs. limit=6.0 2024-08-17 16:07:59,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3395100.0, ans=0.0 2024-08-17 16:08:09,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3395100.0, ans=0.0 2024-08-17 16:08:13,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3395200.0, ans=0.125 2024-08-17 16:08:27,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3395200.0, ans=0.125 2024-08-17 16:08:33,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3395300.0, ans=0.04949747468305833 2024-08-17 16:08:43,216 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 16:09:04,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7550, loss[loss=0.08669, beats_loss=0.01236, ecapa_loss=0.0001415, whisper_loss=0.07291, over 21915.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001507, whisper_loss=0.09145, over 3923814.47 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:09:09,666 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-17 16:09:16,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3395500.0, ans=0.0 2024-08-17 16:09:19,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3395500.0, ans=0.125 2024-08-17 16:09:28,078 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 16:09:39,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3395700.0, ans=0.2 2024-08-17 16:09:56,902 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 16:10:04,263 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 16:10:09,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.968e+01 2.390e+01 2.660e+01 3.033e+01 1.573e+02, threshold=5.320e+01, percent-clipped=3.0 2024-08-17 16:10:24,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7600, loss[loss=0.1138, beats_loss=0.00814, ecapa_loss=0.0001611, whisper_loss=0.104, over 15465.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01042, ecapa_loss=0.0001507, whisper_loss=0.09195, over 3929187.18 frames. ], batch size: 63, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:10:29,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3396000.0, ans=0.125 2024-08-17 16:10:32,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3396000.0, ans=0.0 2024-08-17 16:10:46,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-17 16:10:48,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.45 vs. limit=15.0 2024-08-17 16:10:53,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3396100.0, ans=0.125 2024-08-17 16:10:55,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3396200.0, ans=0.1 2024-08-17 16:11:01,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3396200.0, ans=0.0 2024-08-17 16:11:03,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3396200.0, ans=0.2 2024-08-17 16:11:41,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7650, loss[loss=0.09354, beats_loss=0.00958, ecapa_loss=0.000164, whisper_loss=0.08232, over 15247.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001524, whisper_loss=0.09094, over 3897169.42 frames. ], batch size: 63, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:11:55,471 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 16:12:00,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3396600.0, ans=0.125 2024-08-17 16:12:04,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3396600.0, ans=0.0 2024-08-17 16:12:07,330 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 16:12:15,937 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-17 16:12:17,365 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 16:12:32,504 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 16:12:41,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.270e+01 2.469e+01 2.738e+01 5.063e+01, threshold=4.938e+01, percent-clipped=0.0 2024-08-17 16:12:43,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3396900.0, ans=0.0 2024-08-17 16:12:56,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7700, loss[loss=0.1042, beats_loss=0.01164, ecapa_loss=0.000115, whisper_loss=0.09143, over 22199.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.000151, whisper_loss=0.09046, over 3905539.04 frames. ], batch size: 86, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:13:00,063 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-17 16:13:05,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3397000.0, ans=0.1 2024-08-17 16:13:20,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3397100.0, ans=0.0 2024-08-17 16:13:38,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3397300.0, ans=0.1 2024-08-17 16:13:41,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3397300.0, ans=0.2 2024-08-17 16:13:47,611 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-17 16:13:48,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.30 vs. limit=15.0 2024-08-17 16:14:09,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7750, loss[loss=0.1191, beats_loss=0.009471, ecapa_loss=0.0001491, whisper_loss=0.1081, over 22271.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001508, whisper_loss=0.09019, over 3924599.21 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:14:33,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-17 16:14:38,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-17 16:14:40,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3397700.0, ans=0.2 2024-08-17 16:14:43,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3397700.0, ans=0.1 2024-08-17 16:14:49,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-17 16:15:06,093 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.283e+01 2.591e+01 2.932e+01 1.157e+02, threshold=5.183e+01, percent-clipped=2.0 2024-08-17 16:15:20,525 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7800, loss[loss=0.09655, beats_loss=0.01045, ecapa_loss=0.000146, whisper_loss=0.08464, over 17851.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001517, whisper_loss=0.08973, over 3891787.25 frames. ], batch size: 75, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:15:21,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3398000.0, ans=0.0 2024-08-17 16:15:25,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3398000.0, ans=0.125 2024-08-17 16:15:38,533 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 16:15:40,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3398100.0, ans=0.125 2024-08-17 16:15:42,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3398100.0, ans=0.1 2024-08-17 16:15:51,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3398200.0, ans=0.125 2024-08-17 16:15:55,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3398200.0, ans=0.0 2024-08-17 16:15:56,633 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 16:16:03,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3398300.0, ans=0.015 2024-08-17 16:16:12,706 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 16:16:17,727 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 16:16:33,287 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7850, loss[loss=0.1052, beats_loss=0.009461, ecapa_loss=0.0001466, whisper_loss=0.09423, over 22567.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.000152, whisper_loss=0.08994, over 3877997.51 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:16:38,760 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 16:16:46,584 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-17 16:17:03,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2024-08-17 16:17:26,792 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-17 16:17:41,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-08-17 16:17:49,832 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.263e+01 2.535e+01 2.863e+01 4.043e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-17 16:18:00,751 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-17 16:18:10,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3398900.0, ans=0.05 2024-08-17 16:18:13,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-08-17 16:18:13,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7900, loss[loss=0.1053, beats_loss=0.01142, ecapa_loss=0.0001325, whisper_loss=0.09254, over 14058.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001521, whisper_loss=0.09067, over 3902388.18 frames. ], batch size: 55, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:18:22,333 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 16:18:31,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3399000.0, ans=0.2 2024-08-17 16:18:38,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3399100.0, ans=0.125 2024-08-17 16:18:41,534 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-17 16:19:13,694 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-17 16:19:25,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3399300.0, ans=0.125 2024-08-17 16:19:30,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3399300.0, ans=0.125 2024-08-17 16:19:31,545 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 16:19:31,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3399300.0, ans=0.0 2024-08-17 16:19:34,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3399300.0, ans=0.125 2024-08-17 16:19:51,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3399400.0, ans=0.05 2024-08-17 16:19:55,887 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 7950, loss[loss=0.103, beats_loss=0.01214, ecapa_loss=0.0001141, whisper_loss=0.08969, over 23276.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01035, ecapa_loss=0.0001508, whisper_loss=0.09115, over 3906179.96 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:20:04,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3399500.0, ans=0.2 2024-08-17 16:20:19,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3399600.0, ans=0.0 2024-08-17 16:21:16,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-08-17 16:21:17,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3399800.0, ans=0.125 2024-08-17 16:21:20,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.901e+01 2.432e+01 2.654e+01 2.954e+01 3.124e+02, threshold=5.307e+01, percent-clipped=2.0 2024-08-17 16:21:44,315 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8000, loss[loss=0.08267, beats_loss=0.01241, ecapa_loss=0.0001421, whisper_loss=0.06884, over 21006.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01037, ecapa_loss=0.0001499, whisper_loss=0.09122, over 3914859.26 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:21:46,731 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 16:22:00,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3400000.0, ans=0.0 2024-08-17 16:22:02,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3400000.0, ans=0.2 2024-08-17 16:22:07,004 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 16:22:09,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3400100.0, ans=0.125 2024-08-17 16:22:21,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=15.0 2024-08-17 16:22:25,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3400200.0, ans=0.125 2024-08-17 16:22:36,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-17 16:22:42,068 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-17 16:22:45,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3400200.0, ans=0.125 2024-08-17 16:22:48,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-17 16:22:51,728 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 16:23:30,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8050, loss[loss=0.1267, beats_loss=0.009107, ecapa_loss=0.0001584, whisper_loss=0.116, over 23329.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01031, ecapa_loss=0.0001505, whisper_loss=0.09279, over 3937124.70 frames. ], batch size: 94, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:24:01,467 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 16:24:01,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3400600.0, ans=0.125 2024-08-17 16:24:03,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3400600.0, ans=0.1 2024-08-17 16:24:31,746 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-17 16:24:52,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.357e+01 2.561e+01 2.862e+01 4.337e+01, threshold=5.122e+01, percent-clipped=0.0 2024-08-17 16:25:08,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8100, loss[loss=0.08247, beats_loss=0.008532, ecapa_loss=0.00015, whisper_loss=0.07244, over 19890.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01041, ecapa_loss=0.0001504, whisper_loss=0.09142, over 3938625.41 frames. ], batch size: 77, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:25:18,372 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 16:25:27,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3401100.0, ans=0.125 2024-08-17 16:25:31,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3401100.0, ans=0.1 2024-08-17 16:25:35,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3401200.0, ans=0.125 2024-08-17 16:25:41,123 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 16:25:47,443 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 16:25:50,212 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 16:25:54,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3401300.0, ans=0.125 2024-08-17 16:25:58,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3401300.0, ans=0.125 2024-08-17 16:25:59,306 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 16:26:13,425 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8150, loss[loss=0.1223, beats_loss=0.009348, ecapa_loss=0.0001458, whisper_loss=0.1115, over 19263.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01036, ecapa_loss=0.0001499, whisper_loss=0.09186, over 3936359.19 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:26:15,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3401500.0, ans=0.95 2024-08-17 16:26:19,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2024-08-17 16:26:30,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3401600.0, ans=0.0 2024-08-17 16:26:51,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3401800.0, ans=0.125 2024-08-17 16:27:00,002 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 16:27:01,438 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-17 16:27:05,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.914e+01 2.351e+01 2.682e+01 3.225e+01 8.305e+01, threshold=5.364e+01, percent-clipped=1.0 2024-08-17 16:27:06,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2024-08-17 16:27:18,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8200, loss[loss=0.09834, beats_loss=0.01202, ecapa_loss=0.0001324, whisper_loss=0.085, over 18452.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01034, ecapa_loss=0.0001498, whisper_loss=0.09199, over 3927019.07 frames. ], batch size: 74, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:27:38,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3402100.0, ans=0.125 2024-08-17 16:27:41,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.27 vs. limit=15.0 2024-08-17 16:27:46,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.24 vs. limit=10.0 2024-08-17 16:27:58,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3402300.0, ans=0.0 2024-08-17 16:28:21,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3402400.0, ans=0.125 2024-08-17 16:28:21,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.07 vs. limit=10.0 2024-08-17 16:28:23,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8250, loss[loss=0.1031, beats_loss=0.01128, ecapa_loss=0.0001385, whisper_loss=0.09044, over 21957.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01034, ecapa_loss=0.0001498, whisper_loss=0.09184, over 3938821.37 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:28:36,554 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 16:28:42,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2024-08-17 16:28:52,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3402700.0, ans=0.1 2024-08-17 16:28:57,662 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 16:28:59,095 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 16:29:00,474 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-17 16:29:07,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3402800.0, ans=0.1 2024-08-17 16:29:16,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.415e+01 2.647e+01 2.985e+01 4.296e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-17 16:29:29,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8300, loss[loss=0.08477, beats_loss=0.01132, ecapa_loss=0.0001419, whisper_loss=0.07204, over 19991.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001496, whisper_loss=0.09122, over 3962744.67 frames. ], batch size: 81, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:29:31,545 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 16:29:36,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-08-17 16:29:42,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3403100.0, ans=0.95 2024-08-17 16:29:43,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3403100.0, ans=0.125 2024-08-17 16:30:05,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3403200.0, ans=0.0 2024-08-17 16:30:05,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=12.0 2024-08-17 16:30:06,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3403200.0, ans=0.0 2024-08-17 16:30:06,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-08-17 16:30:27,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3403400.0, ans=0.125 2024-08-17 16:30:34,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-08-17 16:30:36,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8350, loss[loss=0.1143, beats_loss=0.01095, ecapa_loss=0.0001355, whisper_loss=0.102, over 22551.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01047, ecapa_loss=0.0001482, whisper_loss=0.09207, over 3979752.35 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:30:43,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3403500.0, ans=0.125 2024-08-17 16:31:05,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3403700.0, ans=0.125 2024-08-17 16:31:11,292 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.94 vs. limit=22.5 2024-08-17 16:31:27,442 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.334e+01 2.617e+01 2.973e+01 3.819e+01, threshold=5.233e+01, percent-clipped=0.0 2024-08-17 16:31:31,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3403900.0, ans=0.125 2024-08-17 16:31:39,526 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 16:31:40,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8400, loss[loss=0.09795, beats_loss=0.01341, ecapa_loss=0.0001523, whisper_loss=0.08301, over 21192.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001493, whisper_loss=0.09128, over 3978109.04 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:32:43,322 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 16:32:45,856 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8450, loss[loss=0.08494, beats_loss=0.01325, ecapa_loss=0.0001365, whisper_loss=0.07032, over 19859.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01063, ecapa_loss=0.000149, whisper_loss=0.09079, over 3957539.86 frames. ], batch size: 81, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:32:48,009 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.112e-03 2024-08-17 16:32:50,182 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 16:33:03,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3404600.0, ans=0.1 2024-08-17 16:33:06,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3404600.0, ans=0.125 2024-08-17 16:33:07,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3404600.0, ans=0.125 2024-08-17 16:33:17,724 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-17 16:33:36,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3404800.0, ans=0.125 2024-08-17 16:33:38,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.287e+01 2.644e+01 3.078e+01 2.118e+02, threshold=5.288e+01, percent-clipped=3.0 2024-08-17 16:33:41,644 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 16:33:49,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3404900.0, ans=0.0 2024-08-17 16:33:52,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8500, loss[loss=0.08883, beats_loss=0.01253, ecapa_loss=0.0001711, whisper_loss=0.07458, over 16100.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001497, whisper_loss=0.09087, over 3951614.69 frames. ], batch size: 68, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:33:55,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.03 vs. limit=15.0 2024-08-17 16:33:56,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3405000.0, ans=0.125 2024-08-17 16:34:21,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3405200.0, ans=0.125 2024-08-17 16:34:22,445 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 16:34:24,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-17 16:34:28,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3405200.0, ans=0.125 2024-08-17 16:34:29,571 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-17 16:34:36,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3405300.0, ans=0.0 2024-08-17 16:34:38,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3405300.0, ans=0.125 2024-08-17 16:34:50,505 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-17 16:34:58,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8550, loss[loss=0.09618, beats_loss=0.007788, ecapa_loss=0.0001643, whisper_loss=0.08675, over 17603.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001491, whisper_loss=0.09114, over 3940049.45 frames. ], batch size: 71, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:35:00,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3405500.0, ans=0.125 2024-08-17 16:35:00,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.49 vs. limit=15.0 2024-08-17 16:35:04,638 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-08-17 16:35:09,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3405500.0, ans=0.05 2024-08-17 16:35:16,886 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 16:35:20,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3405600.0, ans=0.2 2024-08-17 16:35:25,951 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 16:35:28,752 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 16:35:30,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3405700.0, ans=0.0 2024-08-17 16:35:44,883 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 16:35:51,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.281e+01 2.591e+01 2.805e+01 5.234e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 16:35:52,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-17 16:36:04,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8600, loss[loss=0.1004, beats_loss=0.007061, ecapa_loss=0.0001655, whisper_loss=0.09171, over 14352.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001495, whisper_loss=0.09115, over 3899073.51 frames. ], batch size: 54, lr: 2.65e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 16:36:18,313 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-17 16:36:32,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3406200.0, ans=0.04949747468305833 2024-08-17 16:36:52,064 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 23 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-17 16:37:03,512 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-17 16:37:04,717 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 16:37:12,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8650, loss[loss=0.1114, beats_loss=0.007127, ecapa_loss=0.0001819, whisper_loss=0.1024, over 15571.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001486, whisper_loss=0.09121, over 3910746.50 frames. ], batch size: 61, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:37:13,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3406500.0, ans=0.125 2024-08-17 16:37:16,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3406500.0, ans=0.0 2024-08-17 16:37:16,946 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 16:37:20,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3406500.0, ans=10.0 2024-08-17 16:37:30,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3406600.0, ans=0.0 2024-08-17 16:37:49,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3406700.0, ans=0.125 2024-08-17 16:38:06,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.354e+01 2.688e+01 3.025e+01 2.265e+02, threshold=5.375e+01, percent-clipped=1.0 2024-08-17 16:38:15,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3406900.0, ans=0.07 2024-08-17 16:38:21,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8700, loss[loss=0.1096, beats_loss=0.01132, ecapa_loss=0.0001354, whisper_loss=0.09696, over 22173.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001485, whisper_loss=0.09068, over 3896495.12 frames. ], batch size: 88, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:38:47,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3407100.0, ans=0.0 2024-08-17 16:39:07,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3407300.0, ans=0.2 2024-08-17 16:39:10,562 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-17 16:39:13,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3407300.0, ans=0.015 2024-08-17 16:39:32,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8750, loss[loss=0.1205, beats_loss=0.006789, ecapa_loss=0.0002018, whisper_loss=0.1117, over 21881.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001475, whisper_loss=0.09127, over 3882963.54 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:39:42,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-08-17 16:40:27,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.350e+01 2.581e+01 3.005e+01 1.666e+02, threshold=5.162e+01, percent-clipped=1.0 2024-08-17 16:40:32,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3407900.0, ans=6.0 2024-08-17 16:40:40,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8800, loss[loss=0.1037, beats_loss=0.01089, ecapa_loss=0.0001415, whisper_loss=0.09138, over 16903.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001474, whisper_loss=0.09084, over 3869459.34 frames. ], batch size: 67, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:41:33,177 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 16:41:33,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2024-08-17 16:41:48,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8850, loss[loss=0.1218, beats_loss=0.008219, ecapa_loss=0.0001811, whisper_loss=0.1118, over 20659.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001465, whisper_loss=0.09104, over 3877791.48 frames. ], batch size: 85, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:41:55,251 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 34 from Vox, 35 fro AS 2024-08-17 16:41:55,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3408500.0, ans=0.2 2024-08-17 16:42:21,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3408700.0, ans=0.125 2024-08-17 16:42:27,625 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 16:42:44,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.312e+01 2.500e+01 2.835e+01 4.395e+02, threshold=5.001e+01, percent-clipped=3.0 2024-08-17 16:42:58,247 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 16:42:59,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8900, loss[loss=0.1165, beats_loss=0.008561, ecapa_loss=0.0001392, whisper_loss=0.1065, over 24413.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01046, ecapa_loss=0.0001484, whisper_loss=0.09166, over 3915166.62 frames. ], batch size: 93, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:43:05,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3409000.0, ans=0.2 2024-08-17 16:43:13,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3409100.0, ans=0.0 2024-08-17 16:43:24,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3409100.0, ans=0.125 2024-08-17 16:43:28,522 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-17 16:43:28,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3409200.0, ans=0.2 2024-08-17 16:43:32,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3409200.0, ans=0.0 2024-08-17 16:43:39,193 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 16:43:45,675 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 16:43:45,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3409300.0, ans=0.125 2024-08-17 16:43:59,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3409400.0, ans=0.1 2024-08-17 16:44:07,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3409500.0, ans=0.1 2024-08-17 16:44:07,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 8950, loss[loss=0.09551, beats_loss=0.01226, ecapa_loss=0.0001444, whisper_loss=0.0818, over 20319.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001483, whisper_loss=0.09137, over 3933945.48 frames. ], batch size: 83, lr: 2.65e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:44:09,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3409500.0, ans=0.125 2024-08-17 16:44:12,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3409500.0, ans=0.125 2024-08-17 16:44:13,225 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 16:44:14,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3409500.0, ans=0.0 2024-08-17 16:44:44,158 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 16:44:53,889 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 16:45:00,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.330e+01 2.600e+01 2.951e+01 4.837e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-17 16:45:13,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9000, loss[loss=0.1014, beats_loss=0.01108, ecapa_loss=0.0001746, whisper_loss=0.08862, over 20689.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.000149, whisper_loss=0.09066, over 3907683.54 frames. ], batch size: 87, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:45:13,733 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 16:45:49,023 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on ASR_libri: loss=0.2519, beats_loss=0, ecapa_loss=0.0005245, whisper_loss=0.2466, over 922467.00 frames. 2024-08-17 16:46:06,613 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on SV_voxceleb1: loss=0.004189, beats_loss=0, ecapa_loss=0.0004189, whisper_loss=0, over 939242.00 frames. 2024-08-17 16:47:47,290 INFO [train_multi_KD3.py:1149] (3/4) Epoch 23, validation on AT_audioset: loss=0.02324, beats_loss=0.02324, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 16:47:47,300 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 16:47:55,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3410000.0, ans=10.0 2024-08-17 16:47:55,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-17 16:48:07,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3410100.0, ans=0.125 2024-08-17 16:48:22,424 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-17 16:48:34,732 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 16:48:36,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3410300.0, ans=0.125 2024-08-17 16:48:53,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3410500.0, ans=0.125 2024-08-17 16:48:53,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2024-08-17 16:48:54,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9050, loss[loss=0.1029, beats_loss=0.009486, ecapa_loss=0.0001554, whisper_loss=0.0919, over 17030.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001501, whisper_loss=0.09077, over 3923028.67 frames. ], batch size: 66, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:48:57,254 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-17 16:49:06,460 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-17 16:49:16,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3410600.0, ans=0.2 2024-08-17 16:49:20,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.31 vs. limit=22.5 2024-08-17 16:49:24,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3410700.0, ans=0.0 2024-08-17 16:49:38,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3410800.0, ans=0.0 2024-08-17 16:49:40,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3410800.0, ans=0.2 2024-08-17 16:49:47,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3410900.0, ans=0.125 2024-08-17 16:49:47,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.651e+01 2.352e+01 2.621e+01 2.985e+01 9.819e+01, threshold=5.241e+01, percent-clipped=2.0 2024-08-17 16:49:51,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3410900.0, ans=0.125 2024-08-17 16:49:58,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=15.0 2024-08-17 16:50:01,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9100, loss[loss=0.1224, beats_loss=0.008901, ecapa_loss=0.0001402, whisper_loss=0.1121, over 23391.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.00015, whisper_loss=0.09039, over 3916024.13 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:50:01,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3411000.0, ans=0.2 2024-08-17 16:50:10,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3411000.0, ans=0.125 2024-08-17 16:50:17,509 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 16:50:19,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3411100.0, ans=0.2 2024-08-17 16:50:52,546 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 16:50:55,326 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-17 16:50:55,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3411400.0, ans=0.125 2024-08-17 16:51:04,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2024-08-17 16:51:08,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9150, loss[loss=0.09971, beats_loss=0.01107, ecapa_loss=0.000155, whisper_loss=0.08709, over 21571.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001484, whisper_loss=0.09016, over 3939949.60 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:51:09,274 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-17 16:51:09,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3411500.0, ans=0.2 2024-08-17 16:51:15,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2024-08-17 16:51:28,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3411600.0, ans=0.125 2024-08-17 16:51:30,674 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-17 16:51:30,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3411600.0, ans=10.0 2024-08-17 16:51:38,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-08-17 16:51:49,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3411800.0, ans=0.125 2024-08-17 16:51:53,482 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 16:51:57,568 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 16:52:01,836 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 16:52:04,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.286e+01 2.611e+01 2.888e+01 4.834e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 16:52:16,118 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 16:52:18,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9200, loss[loss=0.1045, beats_loss=0.01011, ecapa_loss=0.0001766, whisper_loss=0.09266, over 16200.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001486, whisper_loss=0.09049, over 3908859.51 frames. ], batch size: 67, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:52:22,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3412000.0, ans=0.0 2024-08-17 16:52:25,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3412000.0, ans=0.07 2024-08-17 16:52:25,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3412000.0, ans=0.2 2024-08-17 16:52:37,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3412100.0, ans=0.125 2024-08-17 16:53:00,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2024-08-17 16:53:06,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3412300.0, ans=0.0 2024-08-17 16:53:16,749 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-17 16:53:28,877 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9250, loss[loss=0.1228, beats_loss=0.01052, ecapa_loss=0.0001568, whisper_loss=0.1107, over 22857.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001496, whisper_loss=0.09067, over 3914283.70 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:53:38,054 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-17 16:53:39,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3412500.0, ans=0.1 2024-08-17 16:53:54,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-08-17 16:54:18,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3412800.0, ans=0.2 2024-08-17 16:54:24,314 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.401e+01 2.711e+01 2.916e+01 4.438e+01, threshold=5.422e+01, percent-clipped=0.0 2024-08-17 16:54:39,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9300, loss[loss=0.119, beats_loss=0.009248, ecapa_loss=0.000143, whisper_loss=0.1083, over 18295.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01055, ecapa_loss=0.0001497, whisper_loss=0.09132, over 3921527.77 frames. ], batch size: 68, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:54:39,807 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2024-08-17 16:55:00,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3413100.0, ans=0.09899494936611666 2024-08-17 16:55:12,387 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 9 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-17 16:55:40,001 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 16:55:48,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9350, loss[loss=0.1331, beats_loss=0.008461, ecapa_loss=0.0001592, whisper_loss=0.123, over 23131.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01054, ecapa_loss=0.0001498, whisper_loss=0.09174, over 3927858.42 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:55:49,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3413500.0, ans=0.1 2024-08-17 16:56:03,223 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 16:56:42,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.341e+01 2.647e+01 3.045e+01 1.659e+02, threshold=5.294e+01, percent-clipped=2.0 2024-08-17 16:56:55,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9400, loss[loss=0.11, beats_loss=0.0101, ecapa_loss=0.0001132, whisper_loss=0.09878, over 16145.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01059, ecapa_loss=0.0001496, whisper_loss=0.09146, over 3928846.20 frames. ], batch size: 59, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:57:03,380 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-17 16:57:28,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.13 vs. limit=15.0 2024-08-17 16:57:40,568 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-17 16:57:55,281 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-17 16:57:57,946 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 16:58:00,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9450, loss[loss=0.09598, beats_loss=0.01204, ecapa_loss=0.000149, whisper_loss=0.08245, over 21245.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0106, ecapa_loss=0.0001488, whisper_loss=0.09119, over 3929043.99 frames. ], batch size: 88, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:58:00,340 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 16:58:18,597 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 16:58:21,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3414600.0, ans=0.1 2024-08-17 16:58:51,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.271e+01 2.511e+01 2.774e+01 4.657e+01, threshold=5.021e+01, percent-clipped=0.0 2024-08-17 16:58:56,264 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2024-08-17 16:58:58,452 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-17 16:59:04,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9500, loss[loss=0.1088, beats_loss=0.01082, ecapa_loss=0.000115, whisper_loss=0.09687, over 20273.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01058, ecapa_loss=0.0001492, whisper_loss=0.09078, over 3882518.81 frames. ], batch size: 77, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 16:59:22,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=12.0 2024-08-17 16:59:25,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3415100.0, ans=0.1 2024-08-17 16:59:26,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-08-17 16:59:29,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3415200.0, ans=0.125 2024-08-17 16:59:38,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3415200.0, ans=0.0 2024-08-17 16:59:54,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3415400.0, ans=0.2 2024-08-17 16:59:57,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3415400.0, ans=0.125 2024-08-17 16:59:59,662 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-17 17:00:03,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3415400.0, ans=0.125 2024-08-17 17:00:06,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3415400.0, ans=0.125 2024-08-17 17:00:08,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9550, loss[loss=0.1043, beats_loss=0.01141, ecapa_loss=0.0001695, whisper_loss=0.09116, over 21690.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001487, whisper_loss=0.09043, over 3829195.08 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:00:16,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3415500.0, ans=0.0 2024-08-17 17:00:29,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3415600.0, ans=0.015 2024-08-17 17:00:30,620 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-17 17:00:51,714 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-17 17:00:53,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3415800.0, ans=0.0 2024-08-17 17:00:57,909 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.226e+01 2.523e+01 2.913e+01 4.217e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-17 17:00:59,405 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 17:01:03,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3415900.0, ans=0.1 2024-08-17 17:01:05,985 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-17 17:01:10,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9600, loss[loss=0.1074, beats_loss=0.01142, ecapa_loss=0.0001327, whisper_loss=0.09465, over 21831.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001484, whisper_loss=0.09151, over 3843986.96 frames. ], batch size: 86, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:01:11,090 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 17:01:35,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3416200.0, ans=0.125 2024-08-17 17:01:37,430 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-17 17:01:40,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3416200.0, ans=0.0 2024-08-17 17:01:41,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3416200.0, ans=0.125 2024-08-17 17:02:06,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3416400.0, ans=0.2 2024-08-17 17:02:07,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3416400.0, ans=0.1 2024-08-17 17:02:13,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9650, loss[loss=0.1019, beats_loss=0.009585, ecapa_loss=0.0001829, whisper_loss=0.0905, over 17824.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.000149, whisper_loss=0.09075, over 3863405.79 frames. ], batch size: 74, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:02:36,309 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 17:02:52,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3416800.0, ans=0.1 2024-08-17 17:02:54,255 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 17:02:58,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=22.5 2024-08-17 17:02:59,189 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-17 17:03:03,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.357e+01 2.621e+01 2.968e+01 4.527e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-17 17:03:16,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9700, loss[loss=0.1023, beats_loss=0.01071, ecapa_loss=0.000157, whisper_loss=0.08997, over 21851.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.000149, whisper_loss=0.09013, over 3844523.32 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:03:25,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-17 17:03:28,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3417100.0, ans=0.2 2024-08-17 17:03:30,506 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 17:03:36,751 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 17:03:38,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3417100.0, ans=0.2 2024-08-17 17:03:53,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3417300.0, ans=0.05 2024-08-17 17:03:57,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3417300.0, ans=0.2 2024-08-17 17:04:04,218 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 17:04:19,306 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9750, loss[loss=0.1071, beats_loss=0.0101, ecapa_loss=0.0001725, whisper_loss=0.09528, over 22583.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0107, ecapa_loss=0.0001466, whisper_loss=0.08955, over 3848456.09 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:04:22,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2024-08-17 17:04:23,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.78 vs. limit=22.5 2024-08-17 17:04:38,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-17 17:04:46,403 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 17:04:47,618 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-17 17:04:49,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3417700.0, ans=0.125 2024-08-17 17:04:51,416 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-17 17:04:59,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3417800.0, ans=0.0 2024-08-17 17:05:05,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3417800.0, ans=0.125 2024-08-17 17:05:06,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3417800.0, ans=0.0 2024-08-17 17:05:11,015 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.328e+01 2.610e+01 2.968e+01 3.624e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-17 17:05:17,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.40 vs. limit=22.5 2024-08-17 17:05:23,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9800, loss[loss=0.08715, beats_loss=0.01291, ecapa_loss=0.0001063, whisper_loss=0.07318, over 14207.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001475, whisper_loss=0.08957, over 3820226.04 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:05:30,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3418000.0, ans=0.0 2024-08-17 17:05:30,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3418000.0, ans=0.05 2024-08-17 17:05:37,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3418100.0, ans=0.125 2024-08-17 17:05:38,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2024-08-17 17:06:04,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.91 vs. limit=10.0 2024-08-17 17:06:07,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.56 vs. limit=10.0 2024-08-17 17:06:20,419 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 17:06:27,858 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9850, loss[loss=0.1159, beats_loss=0.008052, ecapa_loss=0.0001716, whisper_loss=0.1061, over 15824.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001471, whisper_loss=0.08992, over 3832535.01 frames. ], batch size: 65, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:06:40,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3418600.0, ans=0.125 2024-08-17 17:06:59,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3418700.0, ans=0.125 2024-08-17 17:07:06,072 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.72 vs. limit=12.0 2024-08-17 17:07:17,678 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.311e+01 2.569e+01 2.955e+01 4.968e+01, threshold=5.138e+01, percent-clipped=0.0 2024-08-17 17:07:21,622 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 17:07:27,688 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 17:07:30,182 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9900, loss[loss=0.09794, beats_loss=0.009243, ecapa_loss=0.0001906, whisper_loss=0.08679, over 22303.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.09083, over 3874197.07 frames. ], batch size: 93, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:07:32,784 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-17 17:07:35,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3419000.0, ans=0.0 2024-08-17 17:07:54,135 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 17:08:06,939 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:08:06,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3419300.0, ans=10.0 2024-08-17 17:08:32,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 9950, loss[loss=0.1225, beats_loss=0.009904, ecapa_loss=0.0001739, whisper_loss=0.1109, over 22923.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001468, whisper_loss=0.09039, over 3862854.23 frames. ], batch size: 90, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:08:40,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3419500.0, ans=0.125 2024-08-17 17:08:50,375 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 17:08:52,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-08-17 17:09:02,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3419700.0, ans=0.1 2024-08-17 17:09:07,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.10 vs. limit=10.0 2024-08-17 17:09:08,265 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 17:09:22,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.277e+01 2.547e+01 2.876e+01 4.081e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-17 17:09:31,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3419900.0, ans=22.5 2024-08-17 17:09:33,118 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-17 17:09:35,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10000, loss[loss=0.08369, beats_loss=0.009359, ecapa_loss=0.000144, whisper_loss=0.0729, over 14988.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001476, whisper_loss=0.09044, over 3870257.75 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:09:36,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.01 vs. limit=10.0 2024-08-17 17:09:42,053 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 17:09:44,588 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 17:09:48,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3420100.0, ans=0.1 2024-08-17 17:09:58,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3420100.0, ans=0.2 2024-08-17 17:10:06,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3420200.0, ans=0.125 2024-08-17 17:10:38,221 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10050, loss[loss=0.1075, beats_loss=0.01102, ecapa_loss=0.0001328, whisper_loss=0.0951, over 23083.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001479, whisper_loss=0.09033, over 3871451.47 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:10:42,088 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 17:10:45,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3420500.0, ans=0.125 2024-08-17 17:10:58,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3420600.0, ans=0.125 2024-08-17 17:11:06,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3420700.0, ans=0.125 2024-08-17 17:11:27,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3420900.0, ans=0.125 2024-08-17 17:11:28,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.359e+01 2.579e+01 2.991e+01 2.335e+02, threshold=5.159e+01, percent-clipped=2.0 2024-08-17 17:11:34,414 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 17:11:37,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3420900.0, ans=0.1 2024-08-17 17:11:40,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10100, loss[loss=0.1083, beats_loss=0.009685, ecapa_loss=0.0001517, whisper_loss=0.09707, over 23166.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001487, whisper_loss=0.08993, over 3904881.34 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:11:58,159 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 17:11:58,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3421100.0, ans=0.0 2024-08-17 17:12:12,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3421200.0, ans=0.07 2024-08-17 17:12:17,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3421300.0, ans=0.2 2024-08-17 17:12:31,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=12.0 2024-08-17 17:12:32,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3421400.0, ans=0.1 2024-08-17 17:12:40,030 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 17:12:41,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3421400.0, ans=0.125 2024-08-17 17:12:43,572 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10150, loss[loss=0.1052, beats_loss=0.01009, ecapa_loss=0.0001191, whisper_loss=0.09392, over 20217.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001509, whisper_loss=0.08982, over 3914828.15 frames. ], batch size: 77, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:12:45,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3421500.0, ans=0.0 2024-08-17 17:12:47,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3421500.0, ans=0.125 2024-08-17 17:13:07,564 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 17:13:11,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2024-08-17 17:13:20,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3421800.0, ans=0.125 2024-08-17 17:13:33,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.337e+01 2.592e+01 2.912e+01 4.503e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-17 17:13:33,574 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 17:13:40,973 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 17:13:45,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10200, loss[loss=0.1021, beats_loss=0.01217, ecapa_loss=0.0001394, whisper_loss=0.08855, over 21257.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001502, whisper_loss=0.08997, over 3903033.88 frames. ], batch size: 85, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:13:57,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.50 vs. limit=10.0 2024-08-17 17:13:58,910 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2024-08-17 17:13:59,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3422100.0, ans=0.1 2024-08-17 17:14:02,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.85 vs. limit=15.0 2024-08-17 17:14:30,518 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 17:14:39,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3422400.0, ans=0.1 2024-08-17 17:14:41,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.96 vs. limit=10.0 2024-08-17 17:14:46,047 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-17 17:14:47,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10250, loss[loss=0.1198, beats_loss=0.009353, ecapa_loss=0.00016, whisper_loss=0.1088, over 23071.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001496, whisper_loss=0.09099, over 3903483.47 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:15:04,176 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 17:15:04,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3422600.0, ans=0.2 2024-08-17 17:15:04,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3422600.0, ans=0.0 2024-08-17 17:15:17,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3422700.0, ans=0.0 2024-08-17 17:15:24,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3422800.0, ans=0.125 2024-08-17 17:15:29,635 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:15:37,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.362e+01 2.622e+01 2.945e+01 4.411e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-17 17:15:40,497 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 17:15:50,576 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10300, loss[loss=0.0964, beats_loss=0.0108, ecapa_loss=0.0001359, whisper_loss=0.08424, over 14129.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01053, ecapa_loss=0.0001495, whisper_loss=0.09146, over 3892280.38 frames. ], batch size: 55, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:15:51,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.83 vs. limit=6.0 2024-08-17 17:15:53,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.38 vs. limit=22.5 2024-08-17 17:15:55,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3423000.0, ans=0.125 2024-08-17 17:16:29,786 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.70 vs. limit=22.5 2024-08-17 17:16:43,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3423400.0, ans=0.0 2024-08-17 17:16:52,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10350, loss[loss=0.09786, beats_loss=0.008901, ecapa_loss=0.0001557, whisper_loss=0.0874, over 16739.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01058, ecapa_loss=0.0001491, whisper_loss=0.09119, over 3870989.48 frames. ], batch size: 63, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:16:58,248 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 17:17:01,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3423500.0, ans=0.0 2024-08-17 17:17:09,762 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 17:17:13,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3423600.0, ans=0.125 2024-08-17 17:17:34,749 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-17 17:17:40,091 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 17:17:43,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.300e+01 2.585e+01 2.974e+01 4.112e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-17 17:17:44,957 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-17 17:17:49,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3423900.0, ans=0.125 2024-08-17 17:17:56,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10400, loss[loss=0.1016, beats_loss=0.01144, ecapa_loss=0.0001225, whisper_loss=0.08898, over 16722.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001491, whisper_loss=0.09123, over 3881074.79 frames. ], batch size: 66, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:17:57,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-17 17:18:10,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3424100.0, ans=0.125 2024-08-17 17:18:17,512 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 17:18:17,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3424100.0, ans=0.0 2024-08-17 17:18:19,954 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 17:18:32,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3424300.0, ans=0.2 2024-08-17 17:18:57,148 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-17 17:18:58,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10450, loss[loss=0.09374, beats_loss=0.009741, ecapa_loss=0.000158, whisper_loss=0.08242, over 22943.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01056, ecapa_loss=0.0001493, whisper_loss=0.09093, over 3886115.88 frames. ], batch size: 93, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:18:59,437 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 17:18:59,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.43 vs. limit=15.0 2024-08-17 17:19:03,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424500.0, ans=0.1 2024-08-17 17:19:22,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3424700.0, ans=0.1 2024-08-17 17:19:35,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3424800.0, ans=0.125 2024-08-17 17:19:37,872 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 29 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 17:19:44,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3424800.0, ans=0.0 2024-08-17 17:19:47,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3424900.0, ans=0.1 2024-08-17 17:19:47,895 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.411e+01 2.718e+01 3.284e+01 2.620e+02, threshold=5.436e+01, percent-clipped=5.0 2024-08-17 17:19:54,435 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 17:20:00,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10500, loss[loss=0.1126, beats_loss=0.01006, ecapa_loss=0.00014, whisper_loss=0.1012, over 23282.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001487, whisper_loss=0.09089, over 3889040.99 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:20:07,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-17 17:20:20,973 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 17:20:21,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3425100.0, ans=0.1 2024-08-17 17:20:27,289 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-17 17:20:31,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3425200.0, ans=0.1 2024-08-17 17:20:32,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-17 17:20:33,107 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 17:20:36,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3425300.0, ans=0.125 2024-08-17 17:20:48,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3425300.0, ans=0.0 2024-08-17 17:20:55,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.12 vs. limit=12.0 2024-08-17 17:20:59,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3425400.0, ans=0.1 2024-08-17 17:21:02,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10550, loss[loss=0.113, beats_loss=0.01008, ecapa_loss=0.0001562, whisper_loss=0.1013, over 22164.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001493, whisper_loss=0.091, over 3929922.36 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:21:10,151 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-17 17:21:17,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3425600.0, ans=0.125 2024-08-17 17:21:24,910 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 17:21:26,111 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 17:21:28,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3425700.0, ans=0.125 2024-08-17 17:21:29,669 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 17:21:34,382 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 17:21:36,809 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 17:21:40,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3425800.0, ans=0.015 2024-08-17 17:21:40,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3425800.0, ans=0.125 2024-08-17 17:21:42,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.22 vs. limit=15.0 2024-08-17 17:21:42,929 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-17 17:21:45,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3425800.0, ans=0.0 2024-08-17 17:21:46,800 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-17 17:21:51,591 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.414e+01 2.704e+01 3.069e+01 2.193e+02, threshold=5.408e+01, percent-clipped=2.0 2024-08-17 17:21:56,604 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 17:22:03,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10600, loss[loss=0.1228, beats_loss=0.009386, ecapa_loss=0.000148, whisper_loss=0.1119, over 16737.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.0001484, whisper_loss=0.09072, over 3896201.78 frames. ], batch size: 64, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:22:06,753 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-17 17:22:09,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3426000.0, ans=0.0 2024-08-17 17:22:11,591 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-17 17:22:13,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3426000.0, ans=0.2 2024-08-17 17:22:15,245 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 17:22:17,743 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 17:22:33,880 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 17:22:35,092 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 17:22:44,367 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 17:22:53,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3426400.0, ans=0.125 2024-08-17 17:23:01,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.49 vs. limit=15.0 2024-08-17 17:23:03,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3426400.0, ans=0.0 2024-08-17 17:23:06,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10650, loss[loss=0.09941, beats_loss=0.01226, ecapa_loss=0.000129, whisper_loss=0.08586, over 22820.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001474, whisper_loss=0.09023, over 3901547.37 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:23:06,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3426500.0, ans=0.2 2024-08-17 17:23:08,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.56 vs. limit=22.5 2024-08-17 17:23:16,826 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 17:23:23,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3426600.0, ans=0.035 2024-08-17 17:23:41,549 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-17 17:23:51,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3426800.0, ans=0.0 2024-08-17 17:23:56,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.375e+01 2.587e+01 2.940e+01 1.166e+02, threshold=5.174e+01, percent-clipped=1.0 2024-08-17 17:24:06,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=12.0 2024-08-17 17:24:08,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10700, loss[loss=0.08996, beats_loss=0.01253, ecapa_loss=0.0001408, whisper_loss=0.07602, over 21158.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001475, whisper_loss=0.09051, over 3892414.40 frames. ], batch size: 87, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:24:10,316 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-17 17:24:15,330 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 17:24:18,898 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 17:24:37,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3427200.0, ans=0.09899494936611666 2024-08-17 17:24:44,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3427200.0, ans=0.125 2024-08-17 17:24:45,320 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.353e+00 2024-08-17 17:24:59,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3427400.0, ans=0.1 2024-08-17 17:25:00,030 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 17:25:10,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10750, loss[loss=0.09882, beats_loss=0.01065, ecapa_loss=0.0001289, whisper_loss=0.08688, over 22761.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01063, ecapa_loss=0.000147, whisper_loss=0.09088, over 3902899.70 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:25:16,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3427500.0, ans=0.0 2024-08-17 17:25:23,410 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-17 17:25:24,737 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-17 17:25:44,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3427700.0, ans=0.1 2024-08-17 17:25:45,694 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 17:25:52,267 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-17 17:26:00,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.471e+01 2.708e+01 3.054e+01 4.365e+01, threshold=5.417e+01, percent-clipped=0.0 2024-08-17 17:26:00,390 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 17:26:02,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.23 vs. limit=6.0 2024-08-17 17:26:04,417 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 17:26:11,695 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-17 17:26:12,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10800, loss[loss=0.09328, beats_loss=0.01144, ecapa_loss=0.0001613, whisper_loss=0.08023, over 21537.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001467, whisper_loss=0.09054, over 3887879.58 frames. ], batch size: 92, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:26:20,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3428000.0, ans=0.2 2024-08-17 17:26:24,212 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 17:26:41,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3428200.0, ans=0.025 2024-08-17 17:26:45,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-17 17:26:47,144 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 17:26:48,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3428200.0, ans=0.1 2024-08-17 17:27:00,759 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 32 from Vox, 28 fro AS 2024-08-17 17:27:05,866 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 31 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 17:27:15,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10850, loss[loss=0.12, beats_loss=0.006265, ecapa_loss=0.0001665, whisper_loss=0.1121, over 15025.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001481, whisper_loss=0.09071, over 3862589.86 frames. ], batch size: 58, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:27:23,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=15.0 2024-08-17 17:27:31,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3428600.0, ans=22.5 2024-08-17 17:27:32,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3428600.0, ans=0.0 2024-08-17 17:27:35,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3428600.0, ans=0.125 2024-08-17 17:27:36,860 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 40 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 17:27:45,555 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07737734913825989, model_norm_threshold=54.16817855834961 2024-08-17 17:27:45,711 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.205e+05, grad_sumsq=1.205e+05, orig_rms_sq=1.000e+00 2024-08-17 17:27:45,844 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 17:27:46,967 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 17:27:53,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3428800.0, ans=0.125 2024-08-17 17:27:55,804 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-17 17:28:00,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3428800.0, ans=0.05 2024-08-17 17:28:05,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.361e+01 2.674e+01 3.069e+01 7.001e+02, threshold=5.348e+01, percent-clipped=1.0 2024-08-17 17:28:17,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3429000.0, ans=0.125 2024-08-17 17:28:18,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10900, loss[loss=0.09765, beats_loss=0.0106, ecapa_loss=0.0001614, whisper_loss=0.08544, over 17140.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01061, ecapa_loss=0.0001479, whisper_loss=0.09051, over 3868433.38 frames. ], batch size: 69, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:28:24,658 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 17:28:35,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3429100.0, ans=0.2 2024-08-17 17:28:39,333 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 17:28:43,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3429200.0, ans=0.125 2024-08-17 17:28:46,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=12.0 2024-08-17 17:28:51,471 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2024-08-17 17:29:01,721 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 17:29:02,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3429300.0, ans=0.1 2024-08-17 17:29:17,763 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 17:29:20,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 10950, loss[loss=0.09564, beats_loss=0.01215, ecapa_loss=0.0001624, whisper_loss=0.08187, over 22071.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001473, whisper_loss=0.09015, over 3871346.11 frames. ], batch size: 94, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:29:20,290 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 17:29:23,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2024-08-17 17:29:35,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=17.44 vs. limit=15.0 2024-08-17 17:29:43,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3429600.0, ans=0.0 2024-08-17 17:29:44,171 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-17 17:29:49,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3429700.0, ans=0.1 2024-08-17 17:29:55,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3429700.0, ans=0.125 2024-08-17 17:30:07,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3429800.0, ans=0.0 2024-08-17 17:30:08,845 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 17:30:10,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.341e+01 2.537e+01 2.893e+01 3.514e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-17 17:30:11,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3429900.0, ans=0.1 2024-08-17 17:30:22,499 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11000, loss[loss=0.1245, beats_loss=0.01092, ecapa_loss=0.0001087, whisper_loss=0.1124, over 22941.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001478, whisper_loss=0.09067, over 3873243.49 frames. ], batch size: 83, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:30:28,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=15.0 2024-08-17 17:30:35,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3430100.0, ans=0.09899494936611666 2024-08-17 17:30:36,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3430100.0, ans=0.2 2024-08-17 17:30:51,023 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 17:30:55,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3430200.0, ans=0.1 2024-08-17 17:31:11,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3430400.0, ans=0.125 2024-08-17 17:31:12,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3430400.0, ans=0.125 2024-08-17 17:31:24,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11050, loss[loss=0.0975, beats_loss=0.009243, ecapa_loss=0.0002044, whisper_loss=0.08621, over 13873.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001489, whisper_loss=0.09005, over 3868744.74 frames. ], batch size: 56, lr: 2.64e-03, grad_scale: 1.152921504606847e+18 2024-08-17 17:31:29,593 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 17:31:29,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3430500.0, ans=0.2 2024-08-17 17:31:29,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3430500.0, ans=0.0 2024-08-17 17:31:34,522 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-17 17:31:37,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3430600.0, ans=0.95 2024-08-17 17:31:42,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3430600.0, ans=0.125 2024-08-17 17:31:44,736 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-17 17:31:45,951 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 17:31:47,145 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 17:31:55,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3430700.0, ans=0.125 2024-08-17 17:32:08,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3430800.0, ans=0.1 2024-08-17 17:32:12,317 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.29 vs. limit=15.0 2024-08-17 17:32:15,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.316e+01 2.591e+01 2.930e+01 4.532e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 17:32:25,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 23, batch 11100, loss[loss=0.1012, beats_loss=0.01132, ecapa_loss=0.0001513, whisper_loss=0.08841, over 22418.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01066, ecapa_loss=0.0001486, whisper_loss=0.09025, over 3873042.22 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:32:52,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2024-08-17 17:33:01,962 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-17 17:33:05,574 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 17:33:10,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3431300.0, ans=0.1 2024-08-17 17:33:11,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3431300.0, ans=0.125 2024-08-17 17:33:46,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 0, loss[loss=0.09081, beats_loss=0.01111, ecapa_loss=0.0001263, whisper_loss=0.07844, over 23839.00 frames. ], tot_loss[loss=0.09081, beats_loss=0.01111, ecapa_loss=0.0001263, whisper_loss=0.07844, over 23839.00 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:33:46,614 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 17:33:57,789 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.6067, 1.4921, 2.6679, 2.6146], device='cuda:3') 2024-08-17 17:34:22,041 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on ASR_libri: loss=0.2501, beats_loss=0, ecapa_loss=0.0005267, whisper_loss=0.2449, over 922467.00 frames. 2024-08-17 17:34:36,614 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on SV_voxceleb1: loss=0.004161, beats_loss=0, ecapa_loss=0.0004161, whisper_loss=0, over 939242.00 frames. 2024-08-17 17:36:22,873 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on AT_audioset: loss=0.02331, beats_loss=0.02331, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 17:36:22,884 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 17:36:23,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3431420.0, ans=0.125 2024-08-17 17:36:28,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3431420.0, ans=0.125 2024-08-17 17:36:31,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3431420.0, ans=0.125 2024-08-17 17:36:36,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.34 vs. limit=22.5 2024-08-17 17:36:56,963 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-17 17:37:15,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3431720.0, ans=0.125 2024-08-17 17:37:18,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3431720.0, ans=0.0 2024-08-17 17:37:19,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.89 vs. limit=15.0 2024-08-17 17:37:33,652 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-17 17:37:49,134 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.526e+01 2.875e+01 3.255e+01 4.776e+01, threshold=5.751e+01, percent-clipped=0.0 2024-08-17 17:37:51,225 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 50, loss[loss=0.1178, beats_loss=0.008764, ecapa_loss=0.000164, whisper_loss=0.1074, over 22557.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009753, ecapa_loss=0.0001499, whisper_loss=0.09109, over 913851.03 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:38:07,252 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 17:38:12,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.17 vs. limit=6.0 2024-08-17 17:38:25,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3432120.0, ans=0.125 2024-08-17 17:38:27,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3432120.0, ans=0.0 2024-08-17 17:38:32,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3432120.0, ans=0.2 2024-08-17 17:38:39,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3432120.0, ans=0.125 2024-08-17 17:38:45,444 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 17:39:17,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3432420.0, ans=0.0 2024-08-17 17:39:18,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 100, loss[loss=0.08877, beats_loss=0.01091, ecapa_loss=0.0001765, whisper_loss=0.0761, over 20958.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.009594, ecapa_loss=0.00015, whisper_loss=0.09119, over 1577423.35 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:39:23,063 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 17:39:25,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3432420.0, ans=0.0 2024-08-17 17:39:28,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3432420.0, ans=0.125 2024-08-17 17:39:37,420 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:39:39,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3432520.0, ans=0.0 2024-08-17 17:39:39,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3432520.0, ans=0.125 2024-08-17 17:39:53,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3432620.0, ans=0.2 2024-08-17 17:39:59,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3432620.0, ans=0.0 2024-08-17 17:40:19,110 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 17:40:21,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3432720.0, ans=0.1 2024-08-17 17:40:39,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3432820.0, ans=0.0 2024-08-17 17:40:41,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.203e+01 2.631e+01 2.865e+01 3.182e+01 4.534e+01, threshold=5.730e+01, percent-clipped=0.0 2024-08-17 17:40:42,701 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 150, loss[loss=0.08906, beats_loss=0.01071, ecapa_loss=0.0001266, whisper_loss=0.07708, over 22673.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009662, ecapa_loss=0.0001501, whisper_loss=0.09055, over 2088397.12 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:40:53,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3432920.0, ans=0.125 2024-08-17 17:41:01,513 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 17:41:09,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3433120.0, ans=0.04949747468305833 2024-08-17 17:41:10,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3433120.0, ans=0.2 2024-08-17 17:41:10,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2024-08-17 17:41:11,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3433120.0, ans=0.1 2024-08-17 17:41:16,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433120.0, ans=0.1 2024-08-17 17:41:35,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3433220.0, ans=0.2 2024-08-17 17:41:38,727 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 17:41:40,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3433320.0, ans=0.125 2024-08-17 17:41:46,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3433320.0, ans=0.0 2024-08-17 17:41:48,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3433320.0, ans=0.125 2024-08-17 17:41:50,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 200, loss[loss=0.1098, beats_loss=0.0119, ecapa_loss=0.0001427, whisper_loss=0.09647, over 15544.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009866, ecapa_loss=0.000152, whisper_loss=0.09034, over 2464105.49 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:42:01,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3433420.0, ans=0.2 2024-08-17 17:42:07,579 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 17:42:09,892 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 17:42:22,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3433620.0, ans=0.2 2024-08-17 17:42:22,927 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-17 17:42:45,091 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 17:42:53,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.081e+01 2.383e+01 2.582e+01 2.960e+01 4.276e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-17 17:42:54,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 250, loss[loss=0.08791, beats_loss=0.009376, ecapa_loss=0.0001811, whisper_loss=0.07672, over 16739.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01007, ecapa_loss=0.0001497, whisper_loss=0.08964, over 2778285.38 frames. ], batch size: 68, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:43:00,185 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 17:43:00,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3433920.0, ans=0.1 2024-08-17 17:43:32,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3434220.0, ans=0.125 2024-08-17 17:43:33,851 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 17:43:44,182 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 17:43:54,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3434320.0, ans=0.0 2024-08-17 17:43:59,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 300, loss[loss=0.08864, beats_loss=0.01161, ecapa_loss=0.0001553, whisper_loss=0.07547, over 13942.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01018, ecapa_loss=0.0001487, whisper_loss=0.08966, over 2975639.68 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:44:13,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3434520.0, ans=0.0 2024-08-17 17:44:15,621 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 17:44:16,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3434520.0, ans=0.2 2024-08-17 17:44:36,383 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-17 17:44:59,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.24 vs. limit=6.0 2024-08-17 17:45:01,341 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 17:45:05,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.312e+01 2.476e+01 2.764e+01 4.106e+02, threshold=4.951e+01, percent-clipped=1.0 2024-08-17 17:45:06,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 350, loss[loss=0.1114, beats_loss=0.01109, ecapa_loss=0.0001329, whisper_loss=0.09894, over 19908.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01033, ecapa_loss=0.0001482, whisper_loss=0.08897, over 3143448.41 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:45:08,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-17 17:45:10,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3434920.0, ans=0.1 2024-08-17 17:45:18,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.96 vs. limit=22.5 2024-08-17 17:45:19,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3435020.0, ans=0.2 2024-08-17 17:45:24,410 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 17:45:24,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3435020.0, ans=0.0 2024-08-17 17:45:37,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3435120.0, ans=0.125 2024-08-17 17:45:38,493 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 17:45:41,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3435120.0, ans=0.125 2024-08-17 17:45:41,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3435120.0, ans=0.0 2024-08-17 17:45:49,147 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 17:45:55,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.75 vs. limit=22.5 2024-08-17 17:45:56,313 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 17:46:11,519 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=6.604e+00 2024-08-17 17:46:15,326 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 400, loss[loss=0.0999, beats_loss=0.009311, ecapa_loss=0.0001253, whisper_loss=0.08933, over 15558.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01032, ecapa_loss=0.000148, whisper_loss=0.08966, over 3269897.57 frames. ], batch size: 58, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:46:18,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3435420.0, ans=0.125 2024-08-17 17:46:20,635 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 17:46:26,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3435420.0, ans=0.05 2024-08-17 17:46:37,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3435520.0, ans=0.125 2024-08-17 17:46:41,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3435620.0, ans=0.0 2024-08-17 17:46:42,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3435620.0, ans=0.0 2024-08-17 17:47:02,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3435720.0, ans=0.125 2024-08-17 17:47:22,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.287e+01 2.548e+01 2.895e+01 1.655e+02, threshold=5.097e+01, percent-clipped=3.0 2024-08-17 17:47:23,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 450, loss[loss=0.07881, beats_loss=0.008822, ecapa_loss=0.0001588, whisper_loss=0.0684, over 18313.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0103, ecapa_loss=0.0001495, whisper_loss=0.08972, over 3393164.25 frames. ], batch size: 69, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:47:32,001 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-17 17:47:33,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3435920.0, ans=0.0 2024-08-17 17:48:07,499 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-17 17:48:14,308 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 30 from Vox, 38 fro AS 2024-08-17 17:48:20,755 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-17 17:48:27,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-17 17:48:29,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3436320.0, ans=0.2 2024-08-17 17:48:30,418 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 17:48:31,553 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 500, loss[loss=0.09143, beats_loss=0.01115, ecapa_loss=0.0001351, whisper_loss=0.07893, over 22747.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01045, ecapa_loss=0.0001493, whisper_loss=0.08887, over 3503466.88 frames. ], batch size: 92, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:48:38,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3436420.0, ans=0.04949747468305833 2024-08-17 17:48:42,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3436420.0, ans=0.0 2024-08-17 17:48:47,160 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 17:48:51,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3436520.0, ans=0.125 2024-08-17 17:49:02,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3436620.0, ans=0.125 2024-08-17 17:49:02,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3436620.0, ans=0.125 2024-08-17 17:49:12,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3436720.0, ans=0.2 2024-08-17 17:49:15,888 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 17:49:20,200 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.322e+01 2024-08-17 17:49:24,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3436720.0, ans=0.1 2024-08-17 17:49:31,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3436820.0, ans=0.125 2024-08-17 17:49:35,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3436820.0, ans=0.125 2024-08-17 17:49:38,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3436820.0, ans=0.0 2024-08-17 17:49:39,163 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.385e+01 2.606e+01 2.957e+01 2.283e+02, threshold=5.212e+01, percent-clipped=2.0 2024-08-17 17:49:40,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 550, loss[loss=0.1032, beats_loss=0.01022, ecapa_loss=0.0001522, whisper_loss=0.09144, over 21881.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001487, whisper_loss=0.08949, over 3589197.34 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:49:57,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.07 vs. limit=22.5 2024-08-17 17:50:04,805 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 17:50:16,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3437120.0, ans=0.0 2024-08-17 17:50:19,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3437120.0, ans=0.125 2024-08-17 17:50:20,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3437120.0, ans=0.125 2024-08-17 17:50:22,494 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 17:50:27,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2024-08-17 17:50:50,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 600, loss[loss=0.1002, beats_loss=0.0102, ecapa_loss=0.0001348, whisper_loss=0.08864, over 21313.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001476, whisper_loss=0.08972, over 3653556.63 frames. ], batch size: 84, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:51:29,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3437620.0, ans=0.1 2024-08-17 17:51:32,251 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 39 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 17:51:48,135 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 17:51:50,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=15.0 2024-08-17 17:51:57,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.305e+01 2.570e+01 2.922e+01 6.139e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-17 17:51:58,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 650, loss[loss=0.1147, beats_loss=0.009721, ecapa_loss=0.0001199, whisper_loss=0.1038, over 20472.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01041, ecapa_loss=0.000147, whisper_loss=0.08904, over 3694995.03 frames. ], batch size: 75, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:52:14,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.67 vs. limit=22.5 2024-08-17 17:52:47,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3438220.0, ans=0.0 2024-08-17 17:52:51,336 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 17:52:51,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3438220.0, ans=0.5 2024-08-17 17:53:01,656 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-17 17:53:06,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3438320.0, ans=0.1 2024-08-17 17:53:09,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 700, loss[loss=0.09307, beats_loss=0.009882, ecapa_loss=0.0001413, whisper_loss=0.08178, over 15384.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001463, whisper_loss=0.08963, over 3730977.59 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:53:18,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3438420.0, ans=0.0 2024-08-17 17:53:29,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3438520.0, ans=0.0 2024-08-17 17:53:44,031 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-17 17:54:10,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3438820.0, ans=0.125 2024-08-17 17:54:26,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.293e+01 2.490e+01 2.735e+01 3.624e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-17 17:54:27,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 750, loss[loss=0.07341, beats_loss=0.01104, ecapa_loss=0.0001452, whisper_loss=0.06091, over 14886.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001462, whisper_loss=0.08944, over 3759263.03 frames. ], batch size: 60, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:54:29,056 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 17:54:35,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3438920.0, ans=0.0 2024-08-17 17:54:58,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3439120.0, ans=0.125 2024-08-17 17:55:02,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3439120.0, ans=0.125 2024-08-17 17:55:20,006 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 17:55:23,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3439220.0, ans=0.2 2024-08-17 17:55:30,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3439320.0, ans=0.0 2024-08-17 17:55:48,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 800, loss[loss=0.1093, beats_loss=0.01107, ecapa_loss=0.0001398, whisper_loss=0.09681, over 21783.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0104, ecapa_loss=0.0001472, whisper_loss=0.08968, over 3780596.18 frames. ], batch size: 85, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:55:57,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2024-08-17 17:56:05,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3439520.0, ans=0.0 2024-08-17 17:56:07,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3439520.0, ans=0.125 2024-08-17 17:56:08,130 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 17:56:13,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3439520.0, ans=0.0 2024-08-17 17:56:35,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3439720.0, ans=0.125 2024-08-17 17:56:57,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3439820.0, ans=0.09899494936611666 2024-08-17 17:57:03,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.287e+01 2.519e+01 2.785e+01 3.931e+01, threshold=5.037e+01, percent-clipped=0.0 2024-08-17 17:57:04,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3439920.0, ans=0.125 2024-08-17 17:57:05,175 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 850, loss[loss=0.1057, beats_loss=0.0114, ecapa_loss=0.0001312, whisper_loss=0.09297, over 21910.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01053, ecapa_loss=0.0001459, whisper_loss=0.08852, over 3796051.98 frames. ], batch size: 86, lr: 2.58e-03, grad_scale: 5.764607523034235e+17 2024-08-17 17:57:54,317 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-17 17:58:08,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3440220.0, ans=0.2 2024-08-17 17:58:14,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2024-08-17 17:58:16,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3440220.0, ans=0.125 2024-08-17 17:58:51,119 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 900, loss[loss=0.08885, beats_loss=0.01082, ecapa_loss=0.0001817, whisper_loss=0.07621, over 21519.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001468, whisper_loss=0.089, over 3802222.66 frames. ], batch size: 92, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 17:58:52,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3440420.0, ans=0.125 2024-08-17 17:58:56,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3440420.0, ans=0.0 2024-08-17 17:59:08,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3440420.0, ans=0.0 2024-08-17 17:59:11,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3440520.0, ans=0.125 2024-08-17 17:59:12,235 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 17:59:15,126 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-17 17:59:34,964 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-17 17:59:49,582 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 18:00:35,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.343e+01 2.504e+01 2.806e+01 4.204e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-17 18:00:35,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 950, loss[loss=0.07895, beats_loss=0.01022, ecapa_loss=0.0001369, whisper_loss=0.06736, over 16000.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001461, whisper_loss=0.08886, over 3779270.41 frames. ], batch size: 62, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:00:43,022 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 18:00:51,834 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 18:00:55,133 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 18:00:55,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3440920.0, ans=0.0 2024-08-17 18:00:56,848 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 18:01:15,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3441020.0, ans=0.125 2024-08-17 18:01:41,898 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 18:02:06,358 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-17 18:02:07,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3441320.0, ans=0.125 2024-08-17 18:02:21,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3441320.0, ans=0.125 2024-08-17 18:02:30,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1000, loss[loss=0.09185, beats_loss=0.00759, ecapa_loss=0.0001325, whisper_loss=0.08293, over 15084.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.08904, over 3798156.04 frames. ], batch size: 55, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:02:51,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.67 vs. limit=15.0 2024-08-17 18:02:58,955 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 18:03:08,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=15.0 2024-08-17 18:03:13,326 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-17 18:03:15,228 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-17 18:03:23,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3441620.0, ans=0.0 2024-08-17 18:03:27,552 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 18:03:48,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2024-08-17 18:03:52,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3441720.0, ans=0.125 2024-08-17 18:04:08,875 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-17 18:04:12,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3441820.0, ans=0.035 2024-08-17 18:04:19,323 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 18:04:22,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.27 vs. limit=22.5 2024-08-17 18:04:26,147 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.287e+01 2.499e+01 2.719e+01 4.151e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-17 18:04:26,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1050, loss[loss=0.11, beats_loss=0.008929, ecapa_loss=0.0001406, whisper_loss=0.09967, over 19133.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001446, whisper_loss=0.08898, over 3777571.51 frames. ], batch size: 73, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:04:27,335 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.095e-02 2024-08-17 18:04:42,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3441920.0, ans=0.125 2024-08-17 18:04:48,458 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-17 18:04:48,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3442020.0, ans=0.125 2024-08-17 18:05:07,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.60 vs. limit=22.5 2024-08-17 18:05:32,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3442220.0, ans=0.2 2024-08-17 18:05:38,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3442320.0, ans=0.125 2024-08-17 18:05:45,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.64 vs. limit=15.0 2024-08-17 18:05:53,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1100, loss[loss=0.1009, beats_loss=0.0131, ecapa_loss=0.0001393, whisper_loss=0.08645, over 23693.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001449, whisper_loss=0.08932, over 3760674.50 frames. ], batch size: 93, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:06:08,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3442520.0, ans=0.125 2024-08-17 18:06:14,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3442520.0, ans=0.1 2024-08-17 18:06:14,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3442520.0, ans=0.125 2024-08-17 18:06:18,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3442520.0, ans=0.125 2024-08-17 18:06:22,533 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 13 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 18:06:41,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.02 vs. limit=15.0 2024-08-17 18:07:06,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.407e+01 2.705e+01 2.966e+01 4.079e+01, threshold=5.411e+01, percent-clipped=0.0 2024-08-17 18:07:06,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1150, loss[loss=0.0958, beats_loss=0.00949, ecapa_loss=0.00014, whisper_loss=0.08491, over 17314.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001442, whisper_loss=0.08914, over 3778553.84 frames. ], batch size: 66, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:07:17,724 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 18:07:21,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3443020.0, ans=0.125 2024-08-17 18:07:21,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3443020.0, ans=0.125 2024-08-17 18:07:24,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3443020.0, ans=0.125 2024-08-17 18:07:26,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3443020.0, ans=0.125 2024-08-17 18:07:39,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3443120.0, ans=0.0 2024-08-17 18:07:50,943 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2024-08-17 18:08:15,064 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 17 from LS+wenet, 27 from Vox, 46 fro AS 2024-08-17 18:08:17,622 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 18:08:19,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1200, loss[loss=0.09639, beats_loss=0.01076, ecapa_loss=0.0001449, whisper_loss=0.08418, over 15431.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001451, whisper_loss=0.08977, over 3782333.73 frames. ], batch size: 59, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:08:26,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3443420.0, ans=0.0 2024-08-17 18:08:35,415 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=15.0 2024-08-17 18:08:37,473 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 18:08:41,776 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-17 18:08:47,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3443620.0, ans=0.125 2024-08-17 18:08:55,020 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.805e+01 2024-08-17 18:09:04,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3443720.0, ans=0.125 2024-08-17 18:09:05,215 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 18:09:13,746 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 18:09:15,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3443720.0, ans=0.125 2024-08-17 18:09:33,374 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.226e+01 2.659e+01 3.142e+01 2.875e+02, threshold=5.318e+01, percent-clipped=2.0 2024-08-17 18:09:33,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1250, loss[loss=0.09323, beats_loss=0.01042, ecapa_loss=0.0001654, whisper_loss=0.08116, over 18206.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.000145, whisper_loss=0.09012, over 3811988.19 frames. ], batch size: 73, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:09:38,789 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-08-17 18:09:42,024 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 18:09:53,396 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 18:10:00,027 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-17 18:10:05,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.43 vs. limit=15.0 2024-08-17 18:10:30,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3444220.0, ans=10.0 2024-08-17 18:10:43,998 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 18:10:48,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1300, loss[loss=0.09241, beats_loss=0.0121, ecapa_loss=0.0001273, whisper_loss=0.07903, over 23494.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001458, whisper_loss=0.08971, over 3796502.51 frames. ], batch size: 93, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:10:54,413 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 18:10:57,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3444420.0, ans=0.0 2024-08-17 18:11:01,024 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 18:11:09,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3444520.0, ans=0.0 2024-08-17 18:11:13,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3444520.0, ans=0.2 2024-08-17 18:11:24,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-17 18:11:25,281 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 18:11:28,068 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06072128564119339, model_norm_threshold=53.17913055419922 2024-08-17 18:11:28,236 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.036e+05, grad_sumsq=1.803e+05, orig_rms_sq=5.745e-01 2024-08-17 18:11:28,504 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-17 18:12:05,241 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.210e+01 2.591e+01 3.011e+01 8.758e+02, threshold=5.182e+01, percent-clipped=3.0 2024-08-17 18:12:05,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1350, loss[loss=0.09953, beats_loss=0.009587, ecapa_loss=0.0001464, whisper_loss=0.08848, over 22650.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001451, whisper_loss=0.08986, over 3808501.91 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:12:11,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3444920.0, ans=0.1 2024-08-17 18:12:11,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3444920.0, ans=0.125 2024-08-17 18:12:30,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3445020.0, ans=0.5 2024-08-17 18:12:36,997 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-17 18:12:44,409 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 18:12:45,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3445120.0, ans=0.2 2024-08-17 18:13:04,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3445320.0, ans=0.125 2024-08-17 18:13:09,151 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-17 18:13:11,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-17 18:13:20,691 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1400, loss[loss=0.07267, beats_loss=0.008975, ecapa_loss=0.0001478, whisper_loss=0.06221, over 17468.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001438, whisper_loss=0.08958, over 3808919.76 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:13:32,644 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 18:13:37,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=3445520.0, ans=0.2 2024-08-17 18:13:43,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=15.0 2024-08-17 18:13:54,155 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 22 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-17 18:14:10,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3445720.0, ans=0.02 2024-08-17 18:14:22,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3445820.0, ans=0.125 2024-08-17 18:14:25,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3445820.0, ans=0.125 2024-08-17 18:14:27,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3445820.0, ans=0.1 2024-08-17 18:15:06,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.243e+01 2.508e+01 2.795e+01 3.559e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-17 18:15:06,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1450, loss[loss=0.1089, beats_loss=0.01079, ecapa_loss=0.0001345, whisper_loss=0.09677, over 19324.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.09011, over 3818872.51 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:15:38,300 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-17 18:15:44,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3446120.0, ans=0.0 2024-08-17 18:15:54,194 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 18:15:58,778 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-17 18:16:15,857 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 18:16:20,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1500, loss[loss=0.1136, beats_loss=0.009777, ecapa_loss=0.0001284, whisper_loss=0.1026, over 17929.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01062, ecapa_loss=0.0001434, whisper_loss=0.08874, over 3816971.35 frames. ], batch size: 68, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:16:22,396 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-17 18:16:25,957 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.36 vs. limit=15.0 2024-08-17 18:16:29,743 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-17 18:16:36,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3446520.0, ans=15.0 2024-08-17 18:16:46,553 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-17 18:16:48,255 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 18:16:51,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3446620.0, ans=0.0 2024-08-17 18:16:57,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.53 vs. limit=15.0 2024-08-17 18:17:00,474 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 18:17:03,903 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-17 18:17:07,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3446720.0, ans=0.125 2024-08-17 18:17:10,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3446720.0, ans=0.125 2024-08-17 18:17:16,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3446720.0, ans=0.2 2024-08-17 18:17:28,574 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-17 18:17:29,882 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 18:17:36,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.317e+01 2.500e+01 2.871e+01 1.026e+02, threshold=5.000e+01, percent-clipped=3.0 2024-08-17 18:17:36,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1550, loss[loss=0.09392, beats_loss=0.01119, ecapa_loss=0.0001522, whisper_loss=0.08122, over 19358.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.08913, over 3849157.27 frames. ], batch size: 81, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:17:41,770 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 18:18:13,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3447120.0, ans=0.2 2024-08-17 18:18:19,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3447120.0, ans=0.0 2024-08-17 18:18:35,155 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 18:18:41,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.37 vs. limit=22.5 2024-08-17 18:18:50,447 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-17 18:18:51,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1600, loss[loss=0.1052, beats_loss=0.008293, ecapa_loss=0.0001709, whisper_loss=0.09519, over 21558.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.0106, ecapa_loss=0.0001442, whisper_loss=0.0882, over 3823261.21 frames. ], batch size: 88, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:19:06,061 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 18:19:13,237 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 18:19:30,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3447620.0, ans=0.125 2024-08-17 18:19:52,321 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 27 from LS+wenet, 14 from Vox, 16 fro AS 2024-08-17 18:19:55,451 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 18:20:04,365 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 18:20:05,387 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.245e+01 2.501e+01 2.930e+01 4.153e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-17 18:20:05,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1650, loss[loss=0.1169, beats_loss=0.00942, ecapa_loss=0.0001678, whisper_loss=0.1058, over 22617.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01055, ecapa_loss=0.0001426, whisper_loss=0.08918, over 3838304.52 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:20:14,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3447920.0, ans=0.2 2024-08-17 18:20:20,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3448020.0, ans=0.0 2024-08-17 18:20:33,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3448120.0, ans=0.0 2024-08-17 18:20:50,862 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 18:20:53,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3448220.0, ans=0.0 2024-08-17 18:21:01,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3448220.0, ans=0.2 2024-08-17 18:21:12,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.28 vs. limit=6.0 2024-08-17 18:21:17,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1700, loss[loss=0.09809, beats_loss=0.00981, ecapa_loss=0.0001734, whisper_loss=0.08654, over 18598.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001433, whisper_loss=0.09004, over 3825855.61 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:21:23,292 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 18:21:29,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3448420.0, ans=0.0 2024-08-17 18:21:30,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3448520.0, ans=0.125 2024-08-17 18:21:48,292 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 18:22:01,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3448720.0, ans=0.0 2024-08-17 18:22:05,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3448720.0, ans=0.0 2024-08-17 18:22:10,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3448720.0, ans=0.1 2024-08-17 18:22:23,652 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 16 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-17 18:22:26,057 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.381e+01 2.629e+01 2.847e+01 4.282e+01, threshold=5.258e+01, percent-clipped=0.0 2024-08-17 18:22:26,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1750, loss[loss=0.09868, beats_loss=0.009777, ecapa_loss=0.0001796, whisper_loss=0.0871, over 20828.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01043, ecapa_loss=0.0001445, whisper_loss=0.0894, over 3821868.74 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:22:30,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-17 18:22:42,255 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 18:22:42,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3449020.0, ans=0.125 2024-08-17 18:22:47,969 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.040e+05 2024-08-17 18:22:52,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3449120.0, ans=0.125 2024-08-17 18:23:08,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3449220.0, ans=0.0 2024-08-17 18:23:22,027 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 18:23:23,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.38 vs. limit=22.5 2024-08-17 18:23:33,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1800, loss[loss=0.08, beats_loss=0.01267, ecapa_loss=0.0001447, whisper_loss=0.06588, over 17977.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001439, whisper_loss=0.0893, over 3829735.59 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:23:36,426 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 18:23:39,109 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 18:23:40,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3449420.0, ans=0.2 2024-08-17 18:23:41,752 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-17 18:23:44,721 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-17 18:24:08,160 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 18:24:09,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3449620.0, ans=0.125 2024-08-17 18:24:41,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3449920.0, ans=0.125 2024-08-17 18:24:42,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.197e+01 2.415e+01 2.703e+01 3.683e+01, threshold=4.830e+01, percent-clipped=0.0 2024-08-17 18:24:42,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1850, loss[loss=0.09213, beats_loss=0.01192, ecapa_loss=0.0001077, whisper_loss=0.07914, over 15544.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01045, ecapa_loss=0.0001434, whisper_loss=0.08924, over 3826475.51 frames. ], batch size: 60, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:24:52,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3449920.0, ans=0.2 2024-08-17 18:25:15,945 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-17 18:25:17,564 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.227e-01 2024-08-17 18:25:25,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-08-17 18:25:30,765 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-17 18:25:33,632 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 18:25:39,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-08-17 18:25:42,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3450320.0, ans=0.125 2024-08-17 18:25:50,903 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1900, loss[loss=0.1091, beats_loss=0.01002, ecapa_loss=0.0001545, whisper_loss=0.09754, over 21885.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001427, whisper_loss=0.08948, over 3809221.56 frames. ], batch size: 92, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:25:58,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3450420.0, ans=0.125 2024-08-17 18:26:06,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3450520.0, ans=0.125 2024-08-17 18:26:24,180 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 18:26:55,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3450820.0, ans=0.1 2024-08-17 18:26:59,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.266e+01 2.492e+01 2.719e+01 3.794e+02, threshold=4.984e+01, percent-clipped=0.0 2024-08-17 18:26:59,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 1950, loss[loss=0.07974, beats_loss=0.009813, ecapa_loss=0.000168, whisper_loss=0.06825, over 19175.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.08867, over 3828767.15 frames. ], batch size: 79, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:27:09,698 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-17 18:27:17,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.32 vs. limit=10.0 2024-08-17 18:27:18,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3451020.0, ans=0.1 2024-08-17 18:27:23,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3451020.0, ans=0.125 2024-08-17 18:27:25,277 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 18:27:34,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3451120.0, ans=0.0 2024-08-17 18:27:37,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3451220.0, ans=0.125 2024-08-17 18:27:38,297 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 18:27:47,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3451220.0, ans=0.2 2024-08-17 18:27:59,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451320.0, ans=0.1 2024-08-17 18:28:04,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2000, loss[loss=0.1196, beats_loss=0.008517, ecapa_loss=0.0001292, whisper_loss=0.1098, over 20681.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01049, ecapa_loss=0.0001431, whisper_loss=0.08929, over 3823055.38 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:28:16,813 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 18:28:17,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3451520.0, ans=0.125 2024-08-17 18:28:19,382 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 18:28:26,430 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 18:29:08,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.38 vs. limit=22.5 2024-08-17 18:29:12,766 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.396e+01 2.689e+01 3.009e+01 4.514e+01, threshold=5.377e+01, percent-clipped=1.0 2024-08-17 18:29:12,788 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2050, loss[loss=0.1128, beats_loss=0.01025, ecapa_loss=0.0001365, whisper_loss=0.1011, over 23892.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.0001431, whisper_loss=0.08866, over 3830739.32 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:29:13,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3451920.0, ans=0.1 2024-08-17 18:29:36,088 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08474668860435486, model_norm_threshold=53.77110290527344 2024-08-17 18:29:36,258 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.640e+04, grad_sumsq=7.640e+04, orig_rms_sq=1.000e+00 2024-08-17 18:29:38,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3452120.0, ans=0.125 2024-08-17 18:29:39,045 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 18:29:39,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2024-08-17 18:29:57,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=12.0 2024-08-17 18:30:05,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2024-08-17 18:30:09,255 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-17 18:30:13,429 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 18:30:16,111 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 18:30:18,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2100, loss[loss=0.07869, beats_loss=0.01296, ecapa_loss=0.0001334, whisper_loss=0.06439, over 22118.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01059, ecapa_loss=0.0001431, whisper_loss=0.08877, over 3839544.82 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:30:24,934 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-17 18:30:27,591 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 18:30:35,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3452520.0, ans=0.125 2024-08-17 18:30:43,186 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 18:30:49,762 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 18:30:54,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3452620.0, ans=0.125 2024-08-17 18:30:58,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.86 vs. limit=15.0 2024-08-17 18:31:04,740 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:31:06,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3452720.0, ans=0.125 2024-08-17 18:31:23,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.347e+01 2.592e+01 2.946e+01 6.345e+02, threshold=5.183e+01, percent-clipped=4.0 2024-08-17 18:31:23,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2150, loss[loss=0.1052, beats_loss=0.009905, ecapa_loss=0.0001567, whisper_loss=0.09376, over 17769.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01063, ecapa_loss=0.0001431, whisper_loss=0.08872, over 3847239.83 frames. ], batch size: 72, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:31:28,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.66 vs. limit=22.5 2024-08-17 18:31:29,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3452920.0, ans=0.0 2024-08-17 18:31:58,406 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 18:32:04,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3453220.0, ans=0.125 2024-08-17 18:32:17,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2024-08-17 18:32:20,700 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 18:32:23,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3453320.0, ans=0.0 2024-08-17 18:32:29,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2200, loss[loss=0.07325, beats_loss=0.01445, ecapa_loss=0.0001497, whisper_loss=0.05731, over 15786.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001434, whisper_loss=0.08973, over 3861956.92 frames. ], batch size: 69, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:32:36,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=3453420.0, ans=22.5 2024-08-17 18:32:49,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3453520.0, ans=0.0 2024-08-17 18:32:49,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3453520.0, ans=0.125 2024-08-17 18:32:49,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3453520.0, ans=0.125 2024-08-17 18:32:56,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3453620.0, ans=0.125 2024-08-17 18:32:59,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3453620.0, ans=0.07 2024-08-17 18:33:33,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.333e+01 2.532e+01 2.820e+01 1.498e+02, threshold=5.063e+01, percent-clipped=1.0 2024-08-17 18:33:34,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2250, loss[loss=0.08877, beats_loss=0.009937, ecapa_loss=0.0001438, whisper_loss=0.07739, over 20180.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.09045, over 3860286.77 frames. ], batch size: 79, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:33:42,306 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 18:33:45,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3453920.0, ans=0.1 2024-08-17 18:34:03,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3454120.0, ans=0.125 2024-08-17 18:34:25,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3454320.0, ans=0.125 2024-08-17 18:34:32,490 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-17 18:34:32,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3454320.0, ans=0.125 2024-08-17 18:34:36,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3454320.0, ans=0.0 2024-08-17 18:34:37,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3454320.0, ans=0.2 2024-08-17 18:34:40,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2300, loss[loss=0.1015, beats_loss=0.008564, ecapa_loss=0.0001847, whisper_loss=0.09108, over 19653.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09096, over 3877492.60 frames. ], batch size: 82, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:34:45,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3454420.0, ans=0.125 2024-08-17 18:34:47,625 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 18:35:04,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3454620.0, ans=0.125 2024-08-17 18:35:14,909 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-17 18:35:20,438 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 18:35:21,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.53 vs. limit=22.5 2024-08-17 18:35:44,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.369e+01 2.599e+01 2.912e+01 4.410e+01, threshold=5.198e+01, percent-clipped=0.0 2024-08-17 18:35:44,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2350, loss[loss=0.09716, beats_loss=0.01087, ecapa_loss=0.0001682, whisper_loss=0.0846, over 18448.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001475, whisper_loss=0.09089, over 3840309.37 frames. ], batch size: 77, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:35:51,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3454920.0, ans=0.125 2024-08-17 18:35:56,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3455020.0, ans=0.0 2024-08-17 18:35:56,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3455020.0, ans=0.125 2024-08-17 18:36:05,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3455020.0, ans=0.1 2024-08-17 18:36:14,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3455120.0, ans=0.1 2024-08-17 18:36:30,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3455220.0, ans=0.09899494936611666 2024-08-17 18:36:30,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3455220.0, ans=0.125 2024-08-17 18:36:32,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.08 vs. limit=12.0 2024-08-17 18:36:33,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3455220.0, ans=0.125 2024-08-17 18:36:36,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2024-08-17 18:36:45,089 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-17 18:36:51,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2400, loss[loss=0.07885, beats_loss=0.01084, ecapa_loss=0.0001732, whisper_loss=0.06628, over 21295.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001468, whisper_loss=0.0908, over 3844127.04 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:36:56,195 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 18:37:10,745 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 18:37:11,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3455520.0, ans=0.1 2024-08-17 18:37:39,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3455720.0, ans=0.125 2024-08-17 18:37:49,811 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-17 18:37:56,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3455820.0, ans=0.125 2024-08-17 18:37:56,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2024-08-17 18:38:05,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.211e+01 2.408e+01 2.768e+01 3.443e+01, threshold=4.816e+01, percent-clipped=0.0 2024-08-17 18:38:05,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2450, loss[loss=0.1282, beats_loss=0.01017, ecapa_loss=0.0001428, whisper_loss=0.1166, over 15404.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001462, whisper_loss=0.09045, over 3854153.69 frames. ], batch size: 60, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:38:24,632 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.234e-03 2024-08-17 18:38:39,569 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-17 18:38:43,041 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 18:38:57,643 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-17 18:39:03,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3456220.0, ans=0.125 2024-08-17 18:39:07,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-17 18:39:07,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-17 18:39:26,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2500, loss[loss=0.1035, beats_loss=0.01123, ecapa_loss=0.0001436, whisper_loss=0.09085, over 14525.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001464, whisper_loss=0.09088, over 3859824.31 frames. ], batch size: 58, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:39:36,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3456420.0, ans=0.125 2024-08-17 18:39:46,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3456520.0, ans=0.0 2024-08-17 18:40:02,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3456620.0, ans=0.1 2024-08-17 18:40:21,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.56 vs. limit=10.0 2024-08-17 18:40:33,209 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 18:40:41,780 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.335e+01 2.460e+01 2.782e+01 3.981e+01, threshold=4.921e+01, percent-clipped=0.0 2024-08-17 18:40:41,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2550, loss[loss=0.1061, beats_loss=0.0117, ecapa_loss=0.0001733, whisper_loss=0.0927, over 18209.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.0912, over 3876411.08 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:40:47,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2024-08-17 18:40:52,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3456920.0, ans=0.0 2024-08-17 18:40:56,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3456920.0, ans=0.125 2024-08-17 18:41:07,089 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 18:41:12,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3457020.0, ans=0.1 2024-08-17 18:41:21,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3457120.0, ans=0.1 2024-08-17 18:41:24,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3457120.0, ans=0.2 2024-08-17 18:41:26,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3457120.0, ans=0.125 2024-08-17 18:41:26,977 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-17 18:41:37,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3457220.0, ans=0.125 2024-08-17 18:42:01,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2600, loss[loss=0.09527, beats_loss=0.01198, ecapa_loss=0.0001363, whisper_loss=0.08192, over 18071.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001457, whisper_loss=0.09128, over 3883157.17 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:42:11,842 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 18:42:17,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3457520.0, ans=0.1 2024-08-17 18:42:27,572 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08065144717693329, model_norm_threshold=49.205039978027344 2024-08-17 18:42:27,746 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.182e+04, grad_sumsq=8.182e+04, orig_rms_sq=1.000e+00 2024-08-17 18:42:33,448 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 18:42:40,874 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 18:42:48,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2024-08-17 18:42:55,685 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-17 18:42:58,578 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-17 18:43:14,413 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.414e+01 2.571e+01 2.892e+01 6.101e+02, threshold=5.143e+01, percent-clipped=1.0 2024-08-17 18:43:14,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2650, loss[loss=0.111, beats_loss=0.01168, ecapa_loss=0.0001151, whisper_loss=0.09817, over 18382.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001459, whisper_loss=0.09046, over 3858813.39 frames. ], batch size: 69, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:43:20,775 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 18:43:21,370 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.88 vs. limit=22.5 2024-08-17 18:43:26,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3457920.0, ans=0.1 2024-08-17 18:43:33,124 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 11 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 18:43:34,384 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-17 18:43:46,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3458120.0, ans=0.125 2024-08-17 18:43:54,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3458120.0, ans=0.1 2024-08-17 18:43:59,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3458220.0, ans=0.125 2024-08-17 18:44:08,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3458220.0, ans=0.125 2024-08-17 18:44:18,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3458320.0, ans=0.09899494936611666 2024-08-17 18:44:25,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2700, loss[loss=0.09917, beats_loss=0.009167, ecapa_loss=0.0001826, whisper_loss=0.08817, over 15992.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.000145, whisper_loss=0.08924, over 3848166.18 frames. ], batch size: 66, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:44:28,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3458420.0, ans=10.0 2024-08-17 18:44:42,523 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-17 18:44:48,521 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-17 18:44:51,040 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-17 18:45:03,688 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 18:45:04,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3458620.0, ans=0.125 2024-08-17 18:45:29,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.08 vs. limit=15.0 2024-08-17 18:45:37,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.308e+01 2.578e+01 2.796e+01 3.722e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 18:45:37,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2750, loss[loss=0.1022, beats_loss=0.007977, ecapa_loss=0.0001734, whisper_loss=0.09244, over 19152.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001453, whisper_loss=0.09029, over 3860014.85 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:45:37,801 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 18:45:50,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3459020.0, ans=0.1 2024-08-17 18:45:53,123 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 18:46:10,186 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 18:46:17,300 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 18:46:18,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3459220.0, ans=0.1 2024-08-17 18:46:30,019 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 18:46:30,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3459220.0, ans=0.0 2024-08-17 18:46:48,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2800, loss[loss=0.1103, beats_loss=0.0102, ecapa_loss=0.0001264, whisper_loss=0.0988, over 19328.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001449, whisper_loss=0.09065, over 3846649.63 frames. ], batch size: 77, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:46:58,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-17 18:47:13,827 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-17 18:47:14,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3459520.0, ans=0.0 2024-08-17 18:47:24,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3459620.0, ans=0.1 2024-08-17 18:47:25,280 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 18:47:27,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3459620.0, ans=0.2 2024-08-17 18:47:37,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3459720.0, ans=0.0 2024-08-17 18:47:45,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3459720.0, ans=0.125 2024-08-17 18:48:04,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.294e+01 2.641e+01 2.883e+01 4.874e+01, threshold=5.283e+01, percent-clipped=0.0 2024-08-17 18:48:04,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2850, loss[loss=0.09754, beats_loss=0.01157, ecapa_loss=0.0001395, whisper_loss=0.08457, over 22240.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.09066, over 3829746.61 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 18:48:10,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3459920.0, ans=0.0 2024-08-17 18:48:14,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3459920.0, ans=0.125 2024-08-17 18:48:17,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3459920.0, ans=0.125 2024-08-17 18:48:22,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3460020.0, ans=0.125 2024-08-17 18:48:30,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3460020.0, ans=0.125 2024-08-17 18:48:57,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3460220.0, ans=0.125 2024-08-17 18:49:00,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3460220.0, ans=0.125 2024-08-17 18:49:11,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3460320.0, ans=0.125 2024-08-17 18:49:21,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2900, loss[loss=0.1109, beats_loss=0.008372, ecapa_loss=0.0001732, whisper_loss=0.1008, over 23070.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001466, whisper_loss=0.09108, over 3866192.37 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:49:41,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=15.0 2024-08-17 18:49:51,446 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-17 18:49:56,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-17 18:50:04,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460620.0, ans=0.1 2024-08-17 18:50:16,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3460720.0, ans=0.1 2024-08-17 18:50:21,278 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-17 18:50:28,556 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 18:50:32,515 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-17 18:50:34,077 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 31 from Vox, 40 fro AS 2024-08-17 18:50:35,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.046e+01 2.363e+01 2.529e+01 2.876e+01 1.646e+02, threshold=5.058e+01, percent-clipped=2.0 2024-08-17 18:50:35,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 2950, loss[loss=0.08005, beats_loss=0.01195, ecapa_loss=0.0001667, whisper_loss=0.06643, over 20748.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001464, whisper_loss=0.09055, over 3866196.64 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:50:50,612 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 18:50:51,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-17 18:51:04,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3461120.0, ans=0.125 2024-08-17 18:51:21,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3461220.0, ans=0.125 2024-08-17 18:51:35,722 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-08-17 18:51:42,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.41 vs. limit=22.5 2024-08-17 18:51:43,052 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 18:51:44,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3000, loss[loss=0.1126, beats_loss=0.009476, ecapa_loss=0.0001279, whisper_loss=0.1018, over 19285.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001476, whisper_loss=0.091, over 3908371.62 frames. ], batch size: 73, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:51:44,144 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 18:52:21,646 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on ASR_libri: loss=0.2525, beats_loss=0, ecapa_loss=0.0005269, whisper_loss=0.2472, over 922467.00 frames. 2024-08-17 18:52:37,975 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on SV_voxceleb1: loss=0.00404, beats_loss=0, ecapa_loss=0.000404, whisper_loss=0, over 939242.00 frames. 2024-08-17 18:54:27,985 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 18:54:27,996 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 18:54:32,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3461420.0, ans=0.1 2024-08-17 18:54:43,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3461520.0, ans=0.1 2024-08-17 18:54:55,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3461620.0, ans=0.125 2024-08-17 18:54:58,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3461620.0, ans=0.0 2024-08-17 18:55:12,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3461720.0, ans=0.0 2024-08-17 18:55:18,847 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 18:55:38,523 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.375e+01 2.603e+01 2.812e+01 5.666e+01, threshold=5.206e+01, percent-clipped=2.0 2024-08-17 18:55:38,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3050, loss[loss=0.09377, beats_loss=0.01217, ecapa_loss=0.0001538, whisper_loss=0.08006, over 17909.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.09153, over 3928945.74 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:55:49,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3461920.0, ans=0.035 2024-08-17 18:55:57,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3462020.0, ans=0.2 2024-08-17 18:56:23,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3462220.0, ans=0.125 2024-08-17 18:56:45,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3100, loss[loss=0.09733, beats_loss=0.0107, ecapa_loss=0.00022, whisper_loss=0.08443, over 18440.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01048, ecapa_loss=0.0001485, whisper_loss=0.09243, over 3942466.36 frames. ], batch size: 80, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:56:47,459 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 18:56:49,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3462420.0, ans=0.125 2024-08-17 18:56:51,295 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 18:57:02,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3462520.0, ans=0.2 2024-08-17 18:57:09,461 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 18:57:10,708 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 18:57:15,078 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-17 18:57:19,089 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-17 18:57:20,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2024-08-17 18:57:36,927 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-17 18:57:54,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.306e+01 2.653e+01 2.984e+01 6.724e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-17 18:57:54,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3150, loss[loss=0.1115, beats_loss=0.01024, ecapa_loss=0.000149, whisper_loss=0.09978, over 22411.00 frames. ], tot_loss[loss=0.1045, beats_loss=0.01051, ecapa_loss=0.0001481, whisper_loss=0.09247, over 3929368.02 frames. ], batch size: 88, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:57:54,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3462920.0, ans=0.1 2024-08-17 18:58:01,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3462920.0, ans=0.1 2024-08-17 18:58:25,406 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 12 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 18:58:32,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3463120.0, ans=0.0 2024-08-17 18:58:36,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3463220.0, ans=0.125 2024-08-17 18:58:39,365 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 18:58:51,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3463320.0, ans=0.2 2024-08-17 18:58:53,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3463320.0, ans=0.125 2024-08-17 18:59:03,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3200, loss[loss=0.07536, beats_loss=0.01106, ecapa_loss=0.0001829, whisper_loss=0.06248, over 13130.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001484, whisper_loss=0.09189, over 3908404.45 frames. ], batch size: 55, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 18:59:05,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3463420.0, ans=0.125 2024-08-17 18:59:16,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3463520.0, ans=0.125 2024-08-17 18:59:23,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3463520.0, ans=0.125 2024-08-17 18:59:23,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3463520.0, ans=0.125 2024-08-17 18:59:24,324 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-17 18:59:29,756 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-17 18:59:31,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-08-17 18:59:43,184 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 18:59:43,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3463720.0, ans=0.1 2024-08-17 18:59:46,682 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 12 from Vox, 43 fro AS 2024-08-17 18:59:53,268 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 18:59:58,549 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-17 19:00:03,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3463820.0, ans=0.0 2024-08-17 19:00:10,529 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.969e+01 2.338e+01 2.602e+01 2.900e+01 3.800e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-17 19:00:10,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3250, loss[loss=0.1201, beats_loss=0.009488, ecapa_loss=0.0001289, whisper_loss=0.1093, over 21013.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01053, ecapa_loss=0.0001482, whisper_loss=0.09261, over 3912793.70 frames. ], batch size: 80, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:00:11,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3463920.0, ans=0.125 2024-08-17 19:00:11,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2024-08-17 19:00:16,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3463920.0, ans=0.0 2024-08-17 19:00:17,363 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-17 19:00:22,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.44 vs. limit=6.0 2024-08-17 19:00:33,005 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 19:00:34,325 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-17 19:00:36,819 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-17 19:00:37,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3464120.0, ans=0.5 2024-08-17 19:00:40,625 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 19:00:57,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2024-08-17 19:01:03,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3464320.0, ans=0.2 2024-08-17 19:01:16,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3300, loss[loss=0.1194, beats_loss=0.01069, ecapa_loss=0.000168, whisper_loss=0.1071, over 21585.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.01054, ecapa_loss=0.0001495, whisper_loss=0.09237, over 3912111.81 frames. ], batch size: 91, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:01:17,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3464420.0, ans=0.125 2024-08-17 19:02:18,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3464820.0, ans=0.125 2024-08-17 19:02:22,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.671e+01 2.198e+01 2.485e+01 2.907e+01 4.364e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-17 19:02:22,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3350, loss[loss=0.1039, beats_loss=0.01056, ecapa_loss=0.0001339, whisper_loss=0.09201, over 17936.00 frames. ], tot_loss[loss=0.1049, beats_loss=0.01051, ecapa_loss=0.0001495, whisper_loss=0.0929, over 3900242.67 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:02:37,321 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 19:02:37,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3465020.0, ans=0.1 2024-08-17 19:02:38,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3465020.0, ans=0.125 2024-08-17 19:02:38,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3465020.0, ans=0.1 2024-08-17 19:02:41,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3465020.0, ans=0.125 2024-08-17 19:02:44,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3465020.0, ans=0.0 2024-08-17 19:02:51,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.14 vs. limit=22.5 2024-08-17 19:03:06,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3465220.0, ans=0.125 2024-08-17 19:03:09,789 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-17 19:03:11,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3465220.0, ans=0.125 2024-08-17 19:03:27,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3465420.0, ans=0.0 2024-08-17 19:03:28,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3400, loss[loss=0.109, beats_loss=0.009896, ecapa_loss=0.0001445, whisper_loss=0.09767, over 22281.00 frames. ], tot_loss[loss=0.1046, beats_loss=0.01048, ecapa_loss=0.0001493, whisper_loss=0.09258, over 3885625.39 frames. ], batch size: 88, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:03:37,255 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 19:03:41,285 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 19:03:52,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3465520.0, ans=0.1 2024-08-17 19:04:08,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3465720.0, ans=0.0 2024-08-17 19:04:11,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.88 vs. limit=22.5 2024-08-17 19:04:15,529 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 19:04:21,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3465820.0, ans=0.05 2024-08-17 19:04:29,604 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 19:04:32,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3465920.0, ans=0.2 2024-08-17 19:04:33,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.314e+01 2.533e+01 2.860e+01 4.018e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-17 19:04:33,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3450, loss[loss=0.1074, beats_loss=0.01138, ecapa_loss=0.0001408, whisper_loss=0.09459, over 19658.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001494, whisper_loss=0.09106, over 3867320.67 frames. ], batch size: 76, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:04:36,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3465920.0, ans=0.125 2024-08-17 19:05:01,085 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 19:05:09,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3466120.0, ans=0.125 2024-08-17 19:05:10,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3466120.0, ans=0.0 2024-08-17 19:05:28,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3466320.0, ans=0.125 2024-08-17 19:05:39,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3500, loss[loss=0.07726, beats_loss=0.01321, ecapa_loss=0.0001258, whisper_loss=0.0628, over 18675.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001491, whisper_loss=0.09043, over 3853841.40 frames. ], batch size: 74, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:05:44,630 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 19:05:45,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3466420.0, ans=0.1 2024-08-17 19:05:48,046 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-17 19:05:54,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3466520.0, ans=0.04949747468305833 2024-08-17 19:05:56,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.18 vs. limit=15.0 2024-08-17 19:05:56,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.43 vs. limit=12.0 2024-08-17 19:05:59,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3466520.0, ans=0.0 2024-08-17 19:06:02,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3466520.0, ans=0.2 2024-08-17 19:06:03,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3466520.0, ans=0.125 2024-08-17 19:06:04,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3466620.0, ans=0.125 2024-08-17 19:06:13,251 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:06:23,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3466720.0, ans=0.125 2024-08-17 19:06:23,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3466720.0, ans=0.125 2024-08-17 19:06:31,515 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 16 from Vox, 51 fro AS 2024-08-17 19:06:45,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.305e+01 2.521e+01 2.842e+01 7.889e+01, threshold=5.042e+01, percent-clipped=2.0 2024-08-17 19:06:45,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3550, loss[loss=0.09463, beats_loss=0.01062, ecapa_loss=0.0001389, whisper_loss=0.08263, over 21700.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001496, whisper_loss=0.0902, over 3868420.63 frames. ], batch size: 86, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:06:58,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-17 19:07:24,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3467220.0, ans=0.125 2024-08-17 19:07:28,591 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-17 19:07:32,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467220.0, ans=0.1 2024-08-17 19:07:39,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3467320.0, ans=0.1 2024-08-17 19:07:52,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3600, loss[loss=0.06828, beats_loss=0.01171, ecapa_loss=0.0001554, whisper_loss=0.05502, over 16126.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001493, whisper_loss=0.08984, over 3873346.07 frames. ], batch size: 66, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:08:00,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3467420.0, ans=0.125 2024-08-17 19:08:04,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3467520.0, ans=0.2 2024-08-17 19:08:09,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-08-17 19:08:23,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.77 vs. limit=10.0 2024-08-17 19:08:29,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3467720.0, ans=0.2 2024-08-17 19:08:31,755 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-17 19:08:45,424 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 19:08:51,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3467820.0, ans=0.0 2024-08-17 19:08:52,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=3467820.0, ans=15.0 2024-08-17 19:08:53,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3467820.0, ans=0.125 2024-08-17 19:08:55,109 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.286e+01 2.581e+01 2.801e+01 4.273e+01, threshold=5.162e+01, percent-clipped=0.0 2024-08-17 19:08:55,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3650, loss[loss=0.09038, beats_loss=0.009486, ecapa_loss=0.0001658, whisper_loss=0.07923, over 19341.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001493, whisper_loss=0.09042, over 3885141.24 frames. ], batch size: 80, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:08:56,438 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-17 19:09:01,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3467920.0, ans=0.0 2024-08-17 19:09:19,904 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 19:09:22,261 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 19:09:26,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3468120.0, ans=0.125 2024-08-17 19:09:35,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3468220.0, ans=0.5 2024-08-17 19:09:47,676 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 19:09:57,693 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3700, loss[loss=0.1017, beats_loss=0.01004, ecapa_loss=0.0001241, whisper_loss=0.09044, over 20798.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.0905, over 3883001.03 frames. ], batch size: 79, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:10:12,801 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-17 19:10:16,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3468520.0, ans=0.125 2024-08-17 19:10:35,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3468720.0, ans=0.125 2024-08-17 19:10:38,992 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 19:10:40,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3468720.0, ans=0.1 2024-08-17 19:10:42,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3468720.0, ans=0.2 2024-08-17 19:10:44,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3468720.0, ans=0.125 2024-08-17 19:10:45,885 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 19:10:46,177 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:10:53,386 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 12 from Vox, 42 fro AS 2024-08-17 19:11:02,330 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.199e+01 2.477e+01 2.865e+01 5.326e+01, threshold=4.955e+01, percent-clipped=1.0 2024-08-17 19:11:02,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3750, loss[loss=0.1158, beats_loss=0.009663, ecapa_loss=0.0001441, whisper_loss=0.1047, over 15807.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.0001482, whisper_loss=0.08963, over 3896517.74 frames. ], batch size: 61, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:11:02,531 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 19:11:11,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3468920.0, ans=0.125 2024-08-17 19:11:12,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3468920.0, ans=0.125 2024-08-17 19:11:13,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3468920.0, ans=0.125 2024-08-17 19:11:23,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=12.0 2024-08-17 19:11:49,373 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09380995482206345, model_norm_threshold=49.54741287231445 2024-08-17 19:11:49,541 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.469e+04, grad_sumsq=6.357e+06, orig_rms_sq=1.018e-02 2024-08-17 19:11:53,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3469320.0, ans=0.1 2024-08-17 19:12:07,364 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3800, loss[loss=0.1039, beats_loss=0.01147, ecapa_loss=0.0001293, whisper_loss=0.09109, over 22437.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001487, whisper_loss=0.08987, over 3880915.72 frames. ], batch size: 88, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:12:18,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3469420.0, ans=0.0 2024-08-17 19:12:21,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3469520.0, ans=0.0 2024-08-17 19:12:25,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3469520.0, ans=0.07 2024-08-17 19:12:28,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-17 19:12:31,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3469520.0, ans=0.125 2024-08-17 19:12:34,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3469620.0, ans=0.125 2024-08-17 19:12:38,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3469620.0, ans=0.125 2024-08-17 19:13:07,478 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 19:13:14,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.37 vs. limit=22.5 2024-08-17 19:13:16,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.982e+01 2.363e+01 2.589e+01 3.155e+01 5.282e+02, threshold=5.179e+01, percent-clipped=3.0 2024-08-17 19:13:16,176 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3850, loss[loss=0.1089, beats_loss=0.008813, ecapa_loss=0.0001958, whisper_loss=0.09809, over 16487.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001493, whisper_loss=0.09051, over 3869494.28 frames. ], batch size: 67, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:13:32,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3470020.0, ans=0.2 2024-08-17 19:13:45,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3470120.0, ans=0.125 2024-08-17 19:13:57,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3470120.0, ans=0.125 2024-08-17 19:14:07,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3470220.0, ans=0.125 2024-08-17 19:14:10,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3470220.0, ans=0.0 2024-08-17 19:14:11,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.89 vs. limit=15.0 2024-08-17 19:14:26,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.95 vs. limit=10.0 2024-08-17 19:14:27,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3900, loss[loss=0.1073, beats_loss=0.004987, ecapa_loss=0.0001448, whisper_loss=0.1008, over 15180.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001481, whisper_loss=0.09128, over 3875743.43 frames. ], batch size: 54, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:14:28,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2024-08-17 19:14:30,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=3470420.0, ans=0.1 2024-08-17 19:14:36,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.48 vs. limit=10.0 2024-08-17 19:14:37,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=8.0 2024-08-17 19:14:40,493 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-17 19:14:51,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3470520.0, ans=0.125 2024-08-17 19:14:51,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3470520.0, ans=0.0 2024-08-17 19:15:17,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3470720.0, ans=0.125 2024-08-17 19:15:35,749 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.330e+01 2.586e+01 2.887e+01 6.168e+01, threshold=5.173e+01, percent-clipped=1.0 2024-08-17 19:15:35,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 3950, loss[loss=0.1031, beats_loss=0.01061, ecapa_loss=0.000149, whisper_loss=0.09097, over 22417.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001482, whisper_loss=0.09116, over 3887215.82 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:16:02,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3471120.0, ans=0.125 2024-08-17 19:16:10,159 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 19:16:19,434 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 19:16:35,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3471320.0, ans=0.0 2024-08-17 19:16:41,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2024-08-17 19:16:43,053 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4000, loss[loss=0.09112, beats_loss=0.008541, ecapa_loss=0.0001963, whisper_loss=0.08062, over 16044.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.0001479, whisper_loss=0.09164, over 3895094.72 frames. ], batch size: 69, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:16:47,079 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 19:17:07,556 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=15.0 2024-08-17 19:17:16,550 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 19:17:16,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3471620.0, ans=0.125 2024-08-17 19:17:23,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3471720.0, ans=0.125 2024-08-17 19:17:23,978 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 33 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 19:17:24,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.85 vs. limit=12.0 2024-08-17 19:17:44,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.303e+01 2.468e+01 2.762e+01 4.613e+01, threshold=4.936e+01, percent-clipped=0.0 2024-08-17 19:17:44,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4050, loss[loss=0.1011, beats_loss=0.01109, ecapa_loss=0.0001699, whisper_loss=0.0883, over 16820.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001487, whisper_loss=0.09154, over 3893473.80 frames. ], batch size: 70, lr: 2.57e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:17:46,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3471920.0, ans=0.125 2024-08-17 19:18:06,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3472020.0, ans=0.95 2024-08-17 19:18:24,030 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 19:18:25,544 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 19:18:36,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.16 vs. limit=15.0 2024-08-17 19:18:47,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4100, loss[loss=0.0686, beats_loss=0.01028, ecapa_loss=0.000173, whisper_loss=0.05659, over 13316.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.0104, ecapa_loss=0.0001493, whisper_loss=0.09176, over 3871880.82 frames. ], batch size: 55, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:19:12,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3472620.0, ans=0.07 2024-08-17 19:19:35,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3472720.0, ans=0.5 2024-08-17 19:19:37,739 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 19:19:42,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.87 vs. limit=10.0 2024-08-17 19:19:47,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.72 vs. limit=22.5 2024-08-17 19:19:49,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.275e+01 2.466e+01 2.807e+01 1.119e+02, threshold=4.931e+01, percent-clipped=1.0 2024-08-17 19:19:49,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4150, loss[loss=0.06479, beats_loss=0.01175, ecapa_loss=0.0001617, whisper_loss=0.05143, over 20166.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01045, ecapa_loss=0.0001489, whisper_loss=0.09158, over 3888517.73 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:19:54,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3472920.0, ans=0.125 2024-08-17 19:19:57,580 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 19:20:01,384 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-17 19:20:23,700 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 19:20:25,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-08-17 19:20:52,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4200, loss[loss=0.08956, beats_loss=0.01184, ecapa_loss=0.0001545, whisper_loss=0.07618, over 19284.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001488, whisper_loss=0.09111, over 3906526.89 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:20:59,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3473420.0, ans=0.0 2024-08-17 19:21:19,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3473620.0, ans=0.05 2024-08-17 19:21:23,881 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 33 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 19:21:25,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3473620.0, ans=0.125 2024-08-17 19:21:45,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3473820.0, ans=0.125 2024-08-17 19:21:55,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.224e+01 2.485e+01 2.719e+01 4.060e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-17 19:21:55,026 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4250, loss[loss=0.08185, beats_loss=0.008498, ecapa_loss=0.0001625, whisper_loss=0.07172, over 16437.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001497, whisper_loss=0.0904, over 3861697.21 frames. ], batch size: 62, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:22:02,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3473920.0, ans=0.125 2024-08-17 19:22:38,579 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.513e+01 2024-08-17 19:22:53,747 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-08-17 19:22:58,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4300, loss[loss=0.1113, beats_loss=0.01141, ecapa_loss=0.0001537, whisper_loss=0.09832, over 22761.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001492, whisper_loss=0.09029, over 3872207.81 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:23:07,089 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 19:23:14,260 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 19:23:18,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2024-08-17 19:23:21,941 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-17 19:23:22,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3474620.0, ans=0.125 2024-08-17 19:23:42,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3474720.0, ans=0.1 2024-08-17 19:23:44,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3474720.0, ans=0.125 2024-08-17 19:23:55,758 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 19:23:57,145 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 19:24:00,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=15.0 2024-08-17 19:24:01,022 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.274e+01 2.524e+01 2.891e+01 1.236e+02, threshold=5.048e+01, percent-clipped=2.0 2024-08-17 19:24:01,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4350, loss[loss=0.1149, beats_loss=0.009694, ecapa_loss=0.0001475, whisper_loss=0.1037, over 19289.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001487, whisper_loss=0.09012, over 3860444.01 frames. ], batch size: 74, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:24:01,215 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-17 19:24:06,059 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 19:24:35,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-08-17 19:24:41,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3475220.0, ans=0.125 2024-08-17 19:24:42,797 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 19:24:57,136 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-17 19:25:03,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.05 vs. limit=15.0 2024-08-17 19:25:03,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4400, loss[loss=0.09967, beats_loss=0.01101, ecapa_loss=0.0001616, whisper_loss=0.08705, over 22335.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001485, whisper_loss=0.09004, over 3859573.58 frames. ], batch size: 95, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:25:12,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3475420.0, ans=10.0 2024-08-17 19:25:24,198 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0026171719655394554, model_norm_threshold=50.480224609375 2024-08-17 19:25:24,364 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.667e+08, grad_sumsq=1.636e+10, orig_rms_sq=1.019e-02 2024-08-17 19:25:25,754 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-17 19:26:05,666 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.336e+01 2.604e+01 2.872e+01 1.929e+04, threshold=5.209e+01, percent-clipped=1.0 2024-08-17 19:26:05,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4450, loss[loss=0.1265, beats_loss=0.01019, ecapa_loss=0.0001536, whisper_loss=0.1148, over 23829.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001473, whisper_loss=0.08977, over 3874012.27 frames. ], batch size: 94, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:26:24,590 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 19:26:38,324 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-17 19:26:52,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=12.0 2024-08-17 19:26:53,582 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 19:27:08,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4500, loss[loss=0.1064, beats_loss=0.01073, ecapa_loss=0.0001772, whisper_loss=0.09393, over 21486.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001478, whisper_loss=0.09012, over 3857790.50 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:27:09,639 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 19:27:14,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.01 vs. limit=15.0 2024-08-17 19:27:17,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3476420.0, ans=0.05 2024-08-17 19:27:23,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3476520.0, ans=0.0 2024-08-17 19:27:24,189 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-17 19:27:29,001 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 19:27:37,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3476620.0, ans=0.125 2024-08-17 19:27:53,058 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05500377342104912, model_norm_threshold=52.087669372558594 2024-08-17 19:27:53,227 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.09, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.494e+04, grad_sumsq=1.471e+05, orig_rms_sq=5.773e-01 2024-08-17 19:27:59,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3476820.0, ans=0.0 2024-08-17 19:28:05,443 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 31 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-17 19:28:09,269 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 19:28:10,283 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.357e+01 2.568e+01 2.877e+01 9.470e+02, threshold=5.136e+01, percent-clipped=3.0 2024-08-17 19:28:10,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4550, loss[loss=0.1043, beats_loss=0.01017, ecapa_loss=0.0001362, whisper_loss=0.09281, over 14406.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001479, whisper_loss=0.09061, over 3854821.59 frames. ], batch size: 56, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:28:18,145 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-17 19:28:43,833 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-17 19:28:48,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 19:28:56,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3477220.0, ans=0.2 2024-08-17 19:29:07,285 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-17 19:29:10,619 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-17 19:29:14,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3477320.0, ans=0.1 2024-08-17 19:29:17,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4600, loss[loss=0.09242, beats_loss=0.01101, ecapa_loss=0.0001656, whisper_loss=0.07976, over 22549.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001496, whisper_loss=0.08994, over 3825728.56 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:29:41,376 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-17 19:29:48,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3477620.0, ans=0.0 2024-08-17 19:30:12,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3477820.0, ans=0.125 2024-08-17 19:30:24,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.235e+01 2.492e+01 2.713e+01 4.136e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-17 19:30:24,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4650, loss[loss=0.08421, beats_loss=0.01256, ecapa_loss=0.0001704, whisper_loss=0.06994, over 17782.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01057, ecapa_loss=0.0001501, whisper_loss=0.08926, over 3842333.43 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:30:27,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3477920.0, ans=0.0 2024-08-17 19:30:29,963 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 19:30:42,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3478020.0, ans=0.1 2024-08-17 19:31:00,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3478120.0, ans=10.0 2024-08-17 19:31:02,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3478120.0, ans=0.125 2024-08-17 19:31:09,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.94 vs. limit=22.5 2024-08-17 19:31:10,059 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-17 19:31:22,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3478320.0, ans=0.125 2024-08-17 19:31:26,990 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-17 19:31:32,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4700, loss[loss=0.08984, beats_loss=0.01192, ecapa_loss=0.0001735, whisper_loss=0.07618, over 21216.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01057, ecapa_loss=0.0001485, whisper_loss=0.08946, over 3879448.44 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:31:32,604 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 19:31:34,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3478420.0, ans=0.0 2024-08-17 19:31:37,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3478420.0, ans=0.125 2024-08-17 19:31:43,972 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-17 19:31:51,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3478520.0, ans=0.1 2024-08-17 19:31:52,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3478520.0, ans=0.035 2024-08-17 19:31:53,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2024-08-17 19:31:56,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3478520.0, ans=0.125 2024-08-17 19:32:14,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3478720.0, ans=0.2 2024-08-17 19:32:28,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3478820.0, ans=10.0 2024-08-17 19:32:39,303 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.685e+01 2.375e+01 2.548e+01 2.840e+01 1.213e+02, threshold=5.097e+01, percent-clipped=2.0 2024-08-17 19:32:39,326 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4750, loss[loss=0.09994, beats_loss=0.0122, ecapa_loss=0.0001372, whisper_loss=0.08637, over 22178.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001482, whisper_loss=0.08966, over 3852299.81 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:32:43,428 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 31 from Vox, 23 fro AS 2024-08-17 19:32:54,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3479020.0, ans=0.125 2024-08-17 19:33:04,704 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 36 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 19:33:35,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-17 19:33:38,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3479320.0, ans=0.1 2024-08-17 19:33:42,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-08-17 19:33:46,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4800, loss[loss=0.1158, beats_loss=0.009679, ecapa_loss=0.0001604, whisper_loss=0.1045, over 22065.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001492, whisper_loss=0.09016, over 3862941.65 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:33:52,706 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 19:33:58,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3479520.0, ans=0.0 2024-08-17 19:34:00,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3479520.0, ans=0.125 2024-08-17 19:34:04,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2024-08-17 19:34:08,890 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 19:34:20,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3479620.0, ans=0.125 2024-08-17 19:34:22,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3479620.0, ans=0.07 2024-08-17 19:34:30,053 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-17 19:34:33,773 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 19:34:36,613 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-17 19:34:52,001 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.389e+01 2.627e+01 2.841e+01 3.778e+01, threshold=5.254e+01, percent-clipped=0.0 2024-08-17 19:34:52,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4850, loss[loss=0.08436, beats_loss=0.01113, ecapa_loss=0.0001016, whisper_loss=0.07222, over 15232.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.000149, whisper_loss=0.09004, over 3862496.80 frames. ], batch size: 56, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:35:03,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3479920.0, ans=0.125 2024-08-17 19:35:04,897 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-17 19:35:13,815 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-17 19:35:24,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3480120.0, ans=0.2 2024-08-17 19:35:30,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3480120.0, ans=0.1 2024-08-17 19:35:30,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3480120.0, ans=0.125 2024-08-17 19:35:34,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3480220.0, ans=0.1 2024-08-17 19:35:37,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3480220.0, ans=0.1 2024-08-17 19:35:39,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-17 19:35:50,606 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 19:35:57,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3480320.0, ans=0.0 2024-08-17 19:35:59,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4900, loss[loss=0.09353, beats_loss=0.01231, ecapa_loss=0.0001258, whisper_loss=0.07996, over 23141.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01062, ecapa_loss=0.0001482, whisper_loss=0.09021, over 3858838.90 frames. ], batch size: 93, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:36:17,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3480520.0, ans=0.09899494936611666 2024-08-17 19:36:34,871 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 19:36:39,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3480720.0, ans=0.125 2024-08-17 19:36:55,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3480820.0, ans=0.0 2024-08-17 19:37:04,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.256e+01 2.549e+01 2.782e+01 4.181e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-17 19:37:04,918 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 4950, loss[loss=0.1098, beats_loss=0.009944, ecapa_loss=0.0001683, whisper_loss=0.09818, over 18634.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001483, whisper_loss=0.08989, over 3880369.06 frames. ], batch size: 74, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:37:17,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3481020.0, ans=0.2 2024-08-17 19:37:23,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.07 vs. limit=22.5 2024-08-17 19:37:37,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.83 vs. limit=15.0 2024-08-17 19:38:08,638 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5000, loss[loss=0.09078, beats_loss=0.01136, ecapa_loss=0.0002063, whisper_loss=0.07736, over 15275.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001487, whisper_loss=0.09007, over 3873865.17 frames. ], batch size: 65, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:38:16,348 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 19:38:30,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3481520.0, ans=0.125 2024-08-17 19:38:35,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=8.0 2024-08-17 19:38:39,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3481620.0, ans=0.0 2024-08-17 19:38:50,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-08-17 19:39:09,779 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.326e+01 2.618e+01 2.940e+01 4.201e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-17 19:39:09,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5050, loss[loss=0.11, beats_loss=0.009617, ecapa_loss=0.0001457, whisper_loss=0.09888, over 22897.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001481, whisper_loss=0.09015, over 3888545.62 frames. ], batch size: 91, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:39:37,807 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 19:39:49,854 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:39:52,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3482220.0, ans=0.1 2024-08-17 19:39:59,214 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-17 19:39:59,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3482220.0, ans=0.125 2024-08-17 19:40:15,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2024-08-17 19:40:16,444 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 19:40:17,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5100, loss[loss=0.09512, beats_loss=0.01061, ecapa_loss=0.0001196, whisper_loss=0.08331, over 17909.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001477, whisper_loss=0.09057, over 3896873.79 frames. ], batch size: 68, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:40:26,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3482420.0, ans=0.035 2024-08-17 19:40:26,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3482420.0, ans=0.125 2024-08-17 19:40:34,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3482520.0, ans=0.5 2024-08-17 19:40:50,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3482620.0, ans=0.1 2024-08-17 19:41:11,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3482720.0, ans=0.0 2024-08-17 19:41:11,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3482720.0, ans=0.125 2024-08-17 19:41:30,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.372e+01 2.547e+01 2.927e+01 4.256e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-17 19:41:30,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5150, loss[loss=0.09421, beats_loss=0.01089, ecapa_loss=0.0001369, whisper_loss=0.08195, over 17792.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001468, whisper_loss=0.09063, over 3897672.07 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:41:35,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3482920.0, ans=0.125 2024-08-17 19:41:35,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3482920.0, ans=0.5 2024-08-17 19:41:39,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3482920.0, ans=0.09899494936611666 2024-08-17 19:41:39,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3482920.0, ans=0.125 2024-08-17 19:42:06,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3483120.0, ans=0.125 2024-08-17 19:42:08,917 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 19:42:21,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2024-08-17 19:42:23,506 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 19:42:41,537 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 19:42:47,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5200, loss[loss=0.09876, beats_loss=0.009731, ecapa_loss=0.0001705, whisper_loss=0.08732, over 18360.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001463, whisper_loss=0.09094, over 3878564.05 frames. ], batch size: 74, lr: 2.56e-03, grad_scale: 1.152921504606847e+18 2024-08-17 19:43:05,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3483520.0, ans=0.0 2024-08-17 19:43:47,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3483720.0, ans=0.09899494936611666 2024-08-17 19:43:49,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3483720.0, ans=0.0 2024-08-17 19:43:57,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3483820.0, ans=0.0 2024-08-17 19:43:58,550 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 19:44:03,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3483820.0, ans=0.1 2024-08-17 19:44:06,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5250, loss[loss=0.1178, beats_loss=0.01059, ecapa_loss=0.0001258, whisper_loss=0.106, over 16496.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001471, whisper_loss=0.09069, over 3850277.10 frames. ], batch size: 61, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:44:07,892 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.262e+01 2.523e+01 2.892e+01 1.179e+02, threshold=5.045e+01, percent-clipped=2.0 2024-08-17 19:44:27,868 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 19:44:44,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3484220.0, ans=0.125 2024-08-17 19:44:53,766 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-17 19:44:54,863 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 19:45:11,191 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5300, loss[loss=0.1069, beats_loss=0.009698, ecapa_loss=0.0001378, whisper_loss=0.09583, over 23603.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001485, whisper_loss=0.09123, over 3846330.44 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:45:21,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3484420.0, ans=0.1 2024-08-17 19:45:36,538 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 19:45:42,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-17 19:45:43,874 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-17 19:45:50,027 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-17 19:45:57,777 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 19:45:57,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3484720.0, ans=0.0 2024-08-17 19:46:09,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=12.0 2024-08-17 19:46:12,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3484820.0, ans=0.0 2024-08-17 19:46:14,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5350, loss[loss=0.1059, beats_loss=0.01153, ecapa_loss=0.0001313, whisper_loss=0.09301, over 23492.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.000147, whisper_loss=0.09102, over 3835883.81 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:46:15,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.678e+01 2.274e+01 2.526e+01 2.902e+01 3.170e+02, threshold=5.052e+01, percent-clipped=3.0 2024-08-17 19:46:27,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3485020.0, ans=0.125 2024-08-17 19:47:02,423 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-17 19:47:03,637 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-17 19:47:18,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5400, loss[loss=0.0945, beats_loss=0.0105, ecapa_loss=0.0001317, whisper_loss=0.08269, over 17893.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001466, whisper_loss=0.09075, over 3851619.45 frames. ], batch size: 70, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:47:18,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-17 19:47:21,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3485420.0, ans=0.0 2024-08-17 19:47:33,571 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 19:47:34,934 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 19:47:43,141 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-17 19:47:44,341 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 19:47:57,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3485620.0, ans=0.125 2024-08-17 19:48:09,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-17 19:48:29,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5450, loss[loss=0.1066, beats_loss=0.007593, ecapa_loss=0.0001781, whisper_loss=0.09726, over 18149.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01036, ecapa_loss=0.0001469, whisper_loss=0.09172, over 3850128.57 frames. ], batch size: 71, lr: 2.56e-03, grad_scale: 5.764607523034235e+17 2024-08-17 19:48:30,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.407e+01 2.677e+01 2.868e+01 8.391e+01, threshold=5.355e+01, percent-clipped=1.0 2024-08-17 19:48:32,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3485920.0, ans=0.0 2024-08-17 19:48:50,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3486020.0, ans=0.0 2024-08-17 19:49:40,799 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 19:49:41,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3486320.0, ans=0.05 2024-08-17 19:49:46,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5500, loss[loss=0.1105, beats_loss=0.01013, ecapa_loss=0.0001634, whisper_loss=0.0987, over 16137.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001473, whisper_loss=0.09098, over 3851221.05 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:49:50,561 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-17 19:49:53,853 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 19:49:59,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3486420.0, ans=0.125 2024-08-17 19:50:19,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3486620.0, ans=0.125 2024-08-17 19:50:19,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3486620.0, ans=0.0 2024-08-17 19:50:30,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=3486620.0, ans=0.02 2024-08-17 19:50:31,683 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 19:50:59,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3486820.0, ans=0.2 2024-08-17 19:51:08,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5550, loss[loss=0.09446, beats_loss=0.01213, ecapa_loss=0.0001643, whisper_loss=0.08069, over 21176.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001464, whisper_loss=0.09063, over 3873855.28 frames. ], batch size: 94, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:51:11,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.339e+01 2.599e+01 2.892e+01 2.617e+02, threshold=5.198e+01, percent-clipped=1.0 2024-08-17 19:51:14,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3486920.0, ans=0.1 2024-08-17 19:51:21,165 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 19:51:25,853 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-17 19:51:31,874 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-17 19:51:32,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3487020.0, ans=0.125 2024-08-17 19:51:33,045 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 19:51:37,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3487020.0, ans=0.125 2024-08-17 19:51:55,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3487220.0, ans=0.0 2024-08-17 19:51:59,095 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 19:52:01,495 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 19:52:05,778 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.773e+00 2024-08-17 19:52:10,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3487320.0, ans=0.125 2024-08-17 19:52:13,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3487320.0, ans=0.035 2024-08-17 19:52:19,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3487320.0, ans=0.0 2024-08-17 19:52:21,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5600, loss[loss=0.09954, beats_loss=0.01203, ecapa_loss=0.0001217, whisper_loss=0.0863, over 22696.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001471, whisper_loss=0.09105, over 3870618.05 frames. ], batch size: 92, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:52:25,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3487420.0, ans=0.125 2024-08-17 19:52:47,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-17 19:52:51,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3487620.0, ans=0.125 2024-08-17 19:53:10,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.17 vs. limit=10.0 2024-08-17 19:53:18,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2024-08-17 19:53:20,622 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 19:53:26,082 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5650, loss[loss=0.1238, beats_loss=0.007234, ecapa_loss=0.0001961, whisper_loss=0.1146, over 20018.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01053, ecapa_loss=0.0001475, whisper_loss=0.09014, over 3900948.07 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:53:28,916 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.385e+01 2.591e+01 2.978e+01 4.325e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-17 19:53:32,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.97 vs. limit=15.0 2024-08-17 19:53:33,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3487920.0, ans=0.0 2024-08-17 19:53:49,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3488020.0, ans=0.125 2024-08-17 19:53:52,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=3488120.0, ans=10.0 2024-08-17 19:54:09,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488220.0, ans=0.1 2024-08-17 19:54:13,613 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 19:54:31,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5700, loss[loss=0.09452, beats_loss=0.01441, ecapa_loss=8.514e-05, whisper_loss=0.07925, over 19498.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001468, whisper_loss=0.09024, over 3888125.03 frames. ], batch size: 73, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:54:51,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3488520.0, ans=0.1 2024-08-17 19:55:04,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3488620.0, ans=0.125 2024-08-17 19:55:14,564 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 19:55:16,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-17 19:55:21,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3488720.0, ans=0.0 2024-08-17 19:55:25,898 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-17 19:55:28,596 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=1.288e-02 2024-08-17 19:55:35,805 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5750, loss[loss=0.1039, beats_loss=0.01112, ecapa_loss=0.0001354, whisper_loss=0.09145, over 19414.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001476, whisper_loss=0.09025, over 3899420.45 frames. ], batch size: 76, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:55:38,279 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.375e+01 2.666e+01 3.139e+01 4.378e+01, threshold=5.332e+01, percent-clipped=0.0 2024-08-17 19:55:41,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3488920.0, ans=0.125 2024-08-17 19:55:47,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2024-08-17 19:56:10,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3489120.0, ans=0.0 2024-08-17 19:56:39,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5800, loss[loss=0.09423, beats_loss=0.01029, ecapa_loss=0.000202, whisper_loss=0.08192, over 15303.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001481, whisper_loss=0.09041, over 3895922.77 frames. ], batch size: 66, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:56:42,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3489420.0, ans=0.125 2024-08-17 19:56:49,740 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 19:57:16,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.13 vs. limit=15.0 2024-08-17 19:57:24,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3489720.0, ans=0.015 2024-08-17 19:57:43,682 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-17 19:57:45,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5850, loss[loss=0.08263, beats_loss=0.01312, ecapa_loss=0.0001566, whisper_loss=0.06793, over 18171.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.09076, over 3890946.79 frames. ], batch size: 74, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:57:48,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.235e+01 2.518e+01 2.723e+01 7.884e+01, threshold=5.036e+01, percent-clipped=1.0 2024-08-17 19:57:56,854 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-17 19:57:57,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3489920.0, ans=0.2 2024-08-17 19:58:29,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3490220.0, ans=0.125 2024-08-17 19:58:31,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3490220.0, ans=0.2 2024-08-17 19:58:44,791 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 19:58:51,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5900, loss[loss=0.08015, beats_loss=0.009999, ecapa_loss=0.0001597, whisper_loss=0.06856, over 22002.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01064, ecapa_loss=0.000147, whisper_loss=0.08949, over 3871741.78 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:58:52,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3490420.0, ans=0.125 2024-08-17 19:58:56,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.59 vs. limit=15.0 2024-08-17 19:58:59,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3490420.0, ans=0.0 2024-08-17 19:59:00,312 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 19:59:01,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3490420.0, ans=0.2 2024-08-17 19:59:14,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3490520.0, ans=0.125 2024-08-17 19:59:30,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-17 19:59:31,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3490720.0, ans=0.07 2024-08-17 19:59:37,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.41 vs. limit=22.5 2024-08-17 19:59:45,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.24 vs. limit=22.5 2024-08-17 19:59:48,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3490820.0, ans=0.0 2024-08-17 19:59:49,694 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 19:59:49,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3490820.0, ans=0.0 2024-08-17 19:59:54,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3490820.0, ans=0.125 2024-08-17 19:59:56,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 5950, loss[loss=0.08067, beats_loss=0.01109, ecapa_loss=0.0001268, whisper_loss=0.06832, over 17033.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01067, ecapa_loss=0.0001469, whisper_loss=0.08931, over 3904100.47 frames. ], batch size: 69, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 19:59:59,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.262e+01 2.461e+01 2.864e+01 3.747e+01, threshold=4.922e+01, percent-clipped=0.0 2024-08-17 20:00:05,145 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 20:00:21,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.03 vs. limit=15.0 2024-08-17 20:00:26,361 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 20:00:28,565 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08441564440727234, model_norm_threshold=49.215904235839844 2024-08-17 20:00:29,014 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.653e+04, grad_sumsq=3.653e+04, orig_rms_sq=1.000e+00 2024-08-17 20:00:29,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3491120.0, ans=0.1 2024-08-17 20:00:33,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-08-17 20:00:43,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3491220.0, ans=0.1 2024-08-17 20:00:46,672 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 20:01:04,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6000, loss[loss=0.09167, beats_loss=0.01223, ecapa_loss=0.0001263, whisper_loss=0.07818, over 22422.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01072, ecapa_loss=0.0001467, whisper_loss=0.08931, over 3910990.23 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:01:04,679 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 20:01:38,367 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on ASR_libri: loss=0.2507, beats_loss=0, ecapa_loss=0.000535, whisper_loss=0.2453, over 922467.00 frames. 2024-08-17 20:01:55,971 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on SV_voxceleb1: loss=0.00412, beats_loss=0, ecapa_loss=0.000412, whisper_loss=0, over 939242.00 frames. 2024-08-17 20:03:40,255 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on AT_audioset: loss=0.02332, beats_loss=0.02332, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 20:03:40,259 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 20:03:40,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3491420.0, ans=0.2 2024-08-17 20:03:48,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3491420.0, ans=0.125 2024-08-17 20:03:52,963 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-17 20:03:59,670 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 20:04:10,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3491620.0, ans=0.2 2024-08-17 20:04:12,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3491620.0, ans=0.125 2024-08-17 20:04:22,952 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 20:04:27,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.20 vs. limit=22.5 2024-08-17 20:04:42,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3491820.0, ans=0.025 2024-08-17 20:04:44,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6050, loss[loss=0.09058, beats_loss=0.01447, ecapa_loss=0.0001078, whisper_loss=0.07504, over 21366.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01073, ecapa_loss=0.0001462, whisper_loss=0.08935, over 3923260.42 frames. ], batch size: 88, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:04:45,108 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 20:04:47,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.332e+01 2.568e+01 2.986e+01 5.830e+02, threshold=5.136e+01, percent-clipped=1.0 2024-08-17 20:04:53,669 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-17 20:04:56,624 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 20:05:09,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3492020.0, ans=0.0 2024-08-17 20:05:38,665 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-17 20:05:49,077 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:05:50,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3492320.0, ans=0.1 2024-08-17 20:05:55,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6100, loss[loss=0.06028, beats_loss=0.01093, ecapa_loss=0.000163, whisper_loss=0.04771, over 12811.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01072, ecapa_loss=0.0001465, whisper_loss=0.08982, over 3914072.10 frames. ], batch size: 53, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:06:02,179 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 20:06:08,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3492420.0, ans=0.0 2024-08-17 20:06:10,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3492520.0, ans=0.0 2024-08-17 20:06:17,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3492520.0, ans=0.125 2024-08-17 20:06:21,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3492520.0, ans=0.0 2024-08-17 20:06:30,904 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-17 20:06:33,541 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 20:06:36,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3492720.0, ans=0.125 2024-08-17 20:06:36,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3492720.0, ans=0.125 2024-08-17 20:06:43,491 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-17 20:06:59,159 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 20:07:06,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6150, loss[loss=0.08703, beats_loss=0.01203, ecapa_loss=0.0001301, whisper_loss=0.0737, over 18751.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.0001463, whisper_loss=0.09051, over 3948337.15 frames. ], batch size: 75, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:07:08,589 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.377e+01 2.638e+01 2.970e+01 6.161e+01, threshold=5.275e+01, percent-clipped=1.0 2024-08-17 20:07:21,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.33 vs. limit=15.0 2024-08-17 20:07:36,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3493120.0, ans=0.125 2024-08-17 20:08:11,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6200, loss[loss=0.08996, beats_loss=0.01219, ecapa_loss=0.000166, whisper_loss=0.07611, over 17881.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01078, ecapa_loss=0.0001467, whisper_loss=0.08983, over 3924366.63 frames. ], batch size: 73, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:08:13,388 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 20:08:15,978 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:08:18,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3493420.0, ans=0.125 2024-08-17 20:08:52,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3493720.0, ans=0.125 2024-08-17 20:08:58,939 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 20:09:02,393 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-17 20:09:12,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3493820.0, ans=0.125 2024-08-17 20:09:16,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6250, loss[loss=0.1215, beats_loss=0.007104, ecapa_loss=0.0001555, whisper_loss=0.1129, over 22813.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01072, ecapa_loss=0.0001466, whisper_loss=0.0893, over 3915741.56 frames. ], batch size: 86, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:09:19,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.313e+01 2.518e+01 2.765e+01 5.277e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-17 20:09:43,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=22.5 2024-08-17 20:10:01,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3494220.0, ans=0.0 2024-08-17 20:10:01,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-17 20:10:23,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6300, loss[loss=0.1082, beats_loss=0.009615, ecapa_loss=0.0001505, whisper_loss=0.09708, over 19142.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01072, ecapa_loss=0.0001466, whisper_loss=0.08913, over 3887251.09 frames. ], batch size: 72, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:10:24,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3494420.0, ans=0.125 2024-08-17 20:10:32,786 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-17 20:10:39,051 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-17 20:10:41,626 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 20:10:55,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3494620.0, ans=0.125 2024-08-17 20:11:09,406 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2024-08-17 20:11:18,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3494820.0, ans=0.125 2024-08-17 20:11:28,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6350, loss[loss=0.0861, beats_loss=0.01244, ecapa_loss=0.0001244, whisper_loss=0.07242, over 16981.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01074, ecapa_loss=0.0001463, whisper_loss=0.08914, over 3877158.39 frames. ], batch size: 67, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:11:29,787 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 20:11:30,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.269e+01 2.498e+01 3.027e+01 2.064e+02, threshold=4.996e+01, percent-clipped=1.0 2024-08-17 20:11:35,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3494920.0, ans=0.2 2024-08-17 20:11:37,630 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 20:11:48,016 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-17 20:12:15,862 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.611e+05 2024-08-17 20:12:31,788 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6400, loss[loss=0.1224, beats_loss=0.008377, ecapa_loss=0.0001817, whisper_loss=0.1122, over 19422.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01069, ecapa_loss=0.0001467, whisper_loss=0.08982, over 3863697.77 frames. ], batch size: 81, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:12:35,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3495420.0, ans=0.125 2024-08-17 20:12:37,063 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 20:12:41,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-17 20:12:53,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3495520.0, ans=0.1 2024-08-17 20:13:03,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3495620.0, ans=0.125 2024-08-17 20:13:17,665 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 20:13:17,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3495720.0, ans=0.125 2024-08-17 20:13:35,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6450, loss[loss=0.09963, beats_loss=0.009739, ecapa_loss=0.0001324, whisper_loss=0.08857, over 15134.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01072, ecapa_loss=0.0001467, whisper_loss=0.09009, over 3866725.95 frames. ], batch size: 60, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:13:38,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.330e+01 2.566e+01 2.943e+01 3.819e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-17 20:14:01,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3496120.0, ans=0.125 2024-08-17 20:14:13,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3496220.0, ans=0.0 2024-08-17 20:14:34,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3496320.0, ans=0.0 2024-08-17 20:14:38,784 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6500, loss[loss=0.1071, beats_loss=0.01201, ecapa_loss=0.0001138, whisper_loss=0.09399, over 20404.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01073, ecapa_loss=0.0001463, whisper_loss=0.09041, over 3888422.12 frames. ], batch size: 79, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:14:39,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3496420.0, ans=0.125 2024-08-17 20:14:46,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.55 vs. limit=12.0 2024-08-17 20:14:46,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-17 20:14:48,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3496420.0, ans=0.125 2024-08-17 20:14:51,439 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 20:14:56,672 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 20:15:22,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3496720.0, ans=0.04949747468305833 2024-08-17 20:15:23,868 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 20:15:24,998 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-17 20:15:25,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3496720.0, ans=0.0 2024-08-17 20:15:25,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.35 vs. limit=12.0 2024-08-17 20:15:32,640 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 20:15:41,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6550, loss[loss=0.1069, beats_loss=0.01118, ecapa_loss=0.0001368, whisper_loss=0.09431, over 22440.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01077, ecapa_loss=0.0001464, whisper_loss=0.09044, over 3898137.56 frames. ], batch size: 90, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:15:43,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3496920.0, ans=0.1 2024-08-17 20:15:43,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.275e+01 2.582e+01 2.844e+01 5.439e+01, threshold=5.163e+01, percent-clipped=1.0 2024-08-17 20:15:54,885 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 29 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-17 20:16:02,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3497020.0, ans=0.0 2024-08-17 20:16:10,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3497120.0, ans=0.0 2024-08-17 20:16:19,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3497220.0, ans=0.125 2024-08-17 20:16:19,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3497220.0, ans=0.125 2024-08-17 20:16:24,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3497220.0, ans=0.0 2024-08-17 20:16:24,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.36 vs. limit=10.0 2024-08-17 20:16:25,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3497220.0, ans=0.0 2024-08-17 20:16:27,974 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-17 20:16:32,869 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 20:16:44,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6600, loss[loss=0.1142, beats_loss=0.01025, ecapa_loss=0.000165, whisper_loss=0.1023, over 20964.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01068, ecapa_loss=0.0001477, whisper_loss=0.09066, over 3928309.70 frames. ], batch size: 87, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:16:45,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2024-08-17 20:16:51,708 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 20:17:09,246 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 20:17:13,647 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.188e-03 2024-08-17 20:17:35,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3497820.0, ans=0.125 2024-08-17 20:17:38,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3497820.0, ans=0.1 2024-08-17 20:17:41,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2024-08-17 20:17:44,248 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 20:17:47,605 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-17 20:17:48,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=22.5 2024-08-17 20:17:48,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6650, loss[loss=0.1042, beats_loss=0.01008, ecapa_loss=0.0001857, whisper_loss=0.09222, over 16168.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001482, whisper_loss=0.09043, over 3917084.61 frames. ], batch size: 66, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:17:49,074 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 20:17:52,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.296e+01 2.528e+01 2.803e+01 4.875e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 20:17:53,749 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-17 20:18:01,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3498020.0, ans=0.125 2024-08-17 20:18:21,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3498120.0, ans=0.125 2024-08-17 20:18:25,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3498120.0, ans=0.125 2024-08-17 20:18:44,026 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 20:18:55,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6700, loss[loss=0.06336, beats_loss=0.01266, ecapa_loss=0.0001162, whisper_loss=0.04954, over 16524.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01063, ecapa_loss=0.0001481, whisper_loss=0.09038, over 3895806.93 frames. ], batch size: 64, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:19:07,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3498520.0, ans=0.125 2024-08-17 20:19:27,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3498620.0, ans=0.125 2024-08-17 20:19:32,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3498620.0, ans=0.125 2024-08-17 20:19:41,875 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 20:19:58,971 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 20:20:02,618 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6750, loss[loss=0.1171, beats_loss=0.008759, ecapa_loss=0.0001703, whisper_loss=0.1067, over 16236.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001494, whisper_loss=0.09031, over 3849388.22 frames. ], batch size: 63, lr: 2.56e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:20:05,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.341e+01 2.553e+01 2.881e+01 4.288e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-17 20:20:23,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3499020.0, ans=0.1 2024-08-17 20:20:32,529 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 17 from Vox, 15 fro AS 2024-08-17 20:20:33,902 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 20:20:47,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3499220.0, ans=0.125 2024-08-17 20:20:56,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3499320.0, ans=0.04949747468305833 2024-08-17 20:20:57,046 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-17 20:21:09,333 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 20:21:10,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6800, loss[loss=0.1289, beats_loss=0.008014, ecapa_loss=0.0001627, whisper_loss=0.1192, over 23108.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001499, whisper_loss=0.09054, over 3868496.02 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:21:20,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3499420.0, ans=0.125 2024-08-17 20:21:28,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3499520.0, ans=0.1 2024-08-17 20:21:34,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3499520.0, ans=0.0 2024-08-17 20:21:41,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3499620.0, ans=0.125 2024-08-17 20:21:41,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3499620.0, ans=0.09899494936611666 2024-08-17 20:21:41,966 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2024-08-17 20:21:54,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3499720.0, ans=0.0 2024-08-17 20:22:02,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.56 vs. limit=8.0 2024-08-17 20:22:04,817 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 20:22:06,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3499820.0, ans=0.2 2024-08-17 20:22:12,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3499820.0, ans=0.2 2024-08-17 20:22:19,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6850, loss[loss=0.07174, beats_loss=0.01077, ecapa_loss=0.0001212, whisper_loss=0.05976, over 16718.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001499, whisper_loss=0.09038, over 3873008.40 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:22:22,025 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.292e+01 2.494e+01 2.760e+01 3.944e+01, threshold=4.988e+01, percent-clipped=0.0 2024-08-17 20:22:23,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3499920.0, ans=0.0 2024-08-17 20:22:31,465 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 20:22:46,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3500120.0, ans=0.025 2024-08-17 20:22:59,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3500220.0, ans=0.2 2024-08-17 20:23:02,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3500220.0, ans=0.125 2024-08-17 20:23:15,141 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 20:23:20,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3500320.0, ans=0.0 2024-08-17 20:23:23,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3500320.0, ans=0.2 2024-08-17 20:23:28,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6900, loss[loss=0.09562, beats_loss=0.0107, ecapa_loss=0.0001551, whisper_loss=0.08337, over 17858.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001496, whisper_loss=0.09029, over 3883551.20 frames. ], batch size: 75, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:23:29,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3500420.0, ans=0.125 2024-08-17 20:23:36,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-17 20:23:50,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3500520.0, ans=0.125 2024-08-17 20:23:55,509 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-17 20:23:59,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3500620.0, ans=0.2 2024-08-17 20:24:01,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.48 vs. limit=22.5 2024-08-17 20:24:03,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3500620.0, ans=0.0 2024-08-17 20:24:08,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.61 vs. limit=15.0 2024-08-17 20:24:30,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=3500820.0, ans=15.0 2024-08-17 20:24:35,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3500920.0, ans=0.0 2024-08-17 20:24:36,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 6950, loss[loss=0.1159, beats_loss=0.008667, ecapa_loss=0.0001589, whisper_loss=0.1056, over 20200.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001497, whisper_loss=0.09076, over 3863981.25 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:24:39,298 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.285e+01 2.467e+01 2.826e+01 3.694e+01, threshold=4.933e+01, percent-clipped=0.0 2024-08-17 20:25:08,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3501120.0, ans=0.125 2024-08-17 20:25:16,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3501220.0, ans=0.0 2024-08-17 20:25:17,747 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 20:25:25,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=22.5 2024-08-17 20:25:38,762 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-17 20:25:43,691 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7000, loss[loss=0.09555, beats_loss=0.01047, ecapa_loss=0.0001363, whisper_loss=0.08372, over 23612.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001489, whisper_loss=0.09039, over 3863623.78 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:25:50,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3501420.0, ans=0.0 2024-08-17 20:25:53,014 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 20:25:55,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3501420.0, ans=0.125 2024-08-17 20:25:56,247 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-17 20:25:58,869 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-17 20:26:00,194 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 20:26:06,918 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-17 20:26:20,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 20:26:31,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3501720.0, ans=0.125 2024-08-17 20:26:31,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2024-08-17 20:26:36,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3501720.0, ans=0.125 2024-08-17 20:26:46,672 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 20:26:53,303 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7050, loss[loss=0.1087, beats_loss=0.01057, ecapa_loss=0.0001443, whisper_loss=0.09666, over 17173.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001486, whisper_loss=0.09073, over 3868303.45 frames. ], batch size: 67, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:26:56,150 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.324e+01 2.539e+01 2.808e+01 3.658e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-17 20:28:04,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7100, loss[loss=0.09749, beats_loss=0.01072, ecapa_loss=0.000156, whisper_loss=0.08521, over 21516.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.000148, whisper_loss=0.09126, over 3894634.21 frames. ], batch size: 88, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:28:16,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3502420.0, ans=0.125 2024-08-17 20:28:16,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=22.5 2024-08-17 20:28:17,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3502520.0, ans=0.0 2024-08-17 20:28:40,861 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-17 20:28:44,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3502620.0, ans=0.0 2024-08-17 20:28:52,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=12.0 2024-08-17 20:29:05,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3502820.0, ans=0.1 2024-08-17 20:29:10,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-17 20:29:13,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7150, loss[loss=0.09994, beats_loss=0.009605, ecapa_loss=0.0001504, whisper_loss=0.08883, over 19463.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001469, whisper_loss=0.09096, over 3906102.91 frames. ], batch size: 76, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:29:16,398 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.338e+01 2.584e+01 3.067e+01 1.427e+02, threshold=5.169e+01, percent-clipped=2.0 2024-08-17 20:29:16,583 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 20:29:18,649 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 20:29:33,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3503020.0, ans=0.125 2024-08-17 20:29:50,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3503120.0, ans=0.95 2024-08-17 20:29:51,614 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-17 20:29:53,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3503120.0, ans=0.125 2024-08-17 20:30:06,278 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-17 20:30:06,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3503220.0, ans=0.1 2024-08-17 20:30:09,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3503320.0, ans=0.0 2024-08-17 20:30:25,191 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7200, loss[loss=0.09117, beats_loss=0.01104, ecapa_loss=0.000128, whisper_loss=0.07885, over 19442.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001467, whisper_loss=0.09119, over 3898177.45 frames. ], batch size: 78, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:30:27,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-08-17 20:30:38,078 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0867324024438858, model_norm_threshold=51.6881103515625 2024-08-17 20:30:38,249 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.27, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.576e+04, grad_sumsq=9.576e+04, orig_rms_sq=1.000e+00 2024-08-17 20:30:41,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.33 vs. limit=10.0 2024-08-17 20:30:45,281 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 13 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 20:30:54,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3503620.0, ans=0.0 2024-08-17 20:31:26,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3503820.0, ans=0.2 2024-08-17 20:31:35,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7250, loss[loss=0.09656, beats_loss=0.01166, ecapa_loss=0.000166, whisper_loss=0.08323, over 21633.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001471, whisper_loss=0.09097, over 3919732.31 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:31:37,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-08-17 20:31:38,260 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.366e+01 2.657e+01 3.070e+01 5.959e+02, threshold=5.314e+01, percent-clipped=2.0 2024-08-17 20:31:52,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3504020.0, ans=0.125 2024-08-17 20:32:03,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3504120.0, ans=0.125 2024-08-17 20:32:22,733 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-17 20:32:29,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3504220.0, ans=0.0 2024-08-17 20:32:29,529 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2024-08-17 20:32:45,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3504320.0, ans=0.05 2024-08-17 20:32:49,208 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7300, loss[loss=0.09393, beats_loss=0.01073, ecapa_loss=0.0001172, whisper_loss=0.08203, over 16236.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001463, whisper_loss=0.09112, over 3921347.93 frames. ], batch size: 61, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:32:52,486 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 20:32:58,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2024-08-17 20:33:07,049 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 28 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-17 20:33:08,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3504520.0, ans=0.0 2024-08-17 20:33:09,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2024-08-17 20:33:14,603 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-17 20:33:30,469 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-17 20:33:53,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3504820.0, ans=0.125 2024-08-17 20:33:59,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3504820.0, ans=0.0 2024-08-17 20:34:09,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7350, loss[loss=0.1193, beats_loss=0.009391, ecapa_loss=0.0001367, whisper_loss=0.1085, over 22896.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01053, ecapa_loss=0.0001466, whisper_loss=0.09216, over 3911631.50 frames. ], batch size: 87, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:34:12,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.308e+01 2.645e+01 2.863e+01 4.311e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-17 20:34:18,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3504920.0, ans=0.0 2024-08-17 20:34:20,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3504920.0, ans=0.2 2024-08-17 20:34:23,267 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-17 20:34:33,229 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-17 20:34:35,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3505020.0, ans=0.125 2024-08-17 20:34:41,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3505120.0, ans=0.125 2024-08-17 20:34:54,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3505120.0, ans=0.025 2024-08-17 20:35:04,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3505220.0, ans=0.0 2024-08-17 20:35:15,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3505320.0, ans=22.5 2024-08-17 20:35:24,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3505320.0, ans=0.125 2024-08-17 20:35:29,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7400, loss[loss=0.1092, beats_loss=0.01119, ecapa_loss=0.0001571, whisper_loss=0.09649, over 23078.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01055, ecapa_loss=0.0001472, whisper_loss=0.09181, over 3890267.11 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:35:35,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3505420.0, ans=0.0 2024-08-17 20:35:48,801 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-17 20:35:50,174 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 20:36:18,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3505720.0, ans=0.125 2024-08-17 20:36:23,928 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 20:36:31,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3505820.0, ans=0.0 2024-08-17 20:36:33,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3505820.0, ans=0.125 2024-08-17 20:36:34,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3505820.0, ans=0.0 2024-08-17 20:36:37,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=12.0 2024-08-17 20:36:47,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7450, loss[loss=0.09126, beats_loss=0.007731, ecapa_loss=0.00016, whisper_loss=0.08193, over 15372.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01055, ecapa_loss=0.0001473, whisper_loss=0.09123, over 3863817.90 frames. ], batch size: 62, lr: 2.55e-03, grad_scale: 2.8823037615171174e+17 2024-08-17 20:36:48,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3505920.0, ans=0.0 2024-08-17 20:36:51,132 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.583e+01 2.401e+01 2.543e+01 2.763e+01 3.752e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-17 20:36:55,038 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:37:07,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3506020.0, ans=0.125 2024-08-17 20:37:30,188 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 20:37:50,334 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 24 from Vox, 17 fro AS 2024-08-17 20:37:54,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2024-08-17 20:38:02,223 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-17 20:38:06,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7500, loss[loss=0.1095, beats_loss=0.009437, ecapa_loss=0.0001708, whisper_loss=0.09832, over 22429.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01055, ecapa_loss=0.0001474, whisper_loss=0.09186, over 3899526.16 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:38:13,999 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-17 20:38:16,518 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 34 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-17 20:38:41,497 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-17 20:38:45,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3506620.0, ans=0.2 2024-08-17 20:38:49,140 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.685e+00 2024-08-17 20:38:56,009 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-17 20:39:09,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3506820.0, ans=0.1 2024-08-17 20:39:21,506 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=12.0 2024-08-17 20:39:22,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7550, loss[loss=0.09584, beats_loss=0.009896, ecapa_loss=0.0001797, whisper_loss=0.08414, over 17378.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01054, ecapa_loss=0.0001488, whisper_loss=0.09132, over 3870090.21 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:39:22,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3506920.0, ans=0.0 2024-08-17 20:39:24,769 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.340e+01 2.512e+01 2.890e+01 6.756e+01, threshold=5.024e+01, percent-clipped=1.0 2024-08-17 20:39:31,329 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-17 20:39:39,979 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-17 20:39:47,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-08-17 20:39:48,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3507020.0, ans=0.1 2024-08-17 20:40:13,709 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-17 20:40:15,351 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 11 from Vox, 24 fro AS 2024-08-17 20:40:15,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3507220.0, ans=0.125 2024-08-17 20:40:16,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=15.0 2024-08-17 20:40:37,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7600, loss[loss=0.1019, beats_loss=0.007956, ecapa_loss=0.0001615, whisper_loss=0.09234, over 13891.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001494, whisper_loss=0.09149, over 3856288.42 frames. ], batch size: 57, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:40:40,766 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-17 20:40:55,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3507520.0, ans=10.0 2024-08-17 20:40:57,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3507520.0, ans=0.0 2024-08-17 20:40:58,569 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2024-08-17 20:41:00,350 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 20:41:04,095 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 20:41:05,388 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 31 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 20:41:07,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3507620.0, ans=0.0 2024-08-17 20:41:11,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3507620.0, ans=0.1 2024-08-17 20:41:27,772 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 20:41:29,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3507720.0, ans=0.125 2024-08-17 20:41:49,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7650, loss[loss=0.1067, beats_loss=0.01039, ecapa_loss=0.0001303, whisper_loss=0.09498, over 20194.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01043, ecapa_loss=0.00015, whisper_loss=0.09167, over 3873098.30 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:41:52,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.316e+01 2.490e+01 2.754e+01 3.586e+01, threshold=4.980e+01, percent-clipped=0.0 2024-08-17 20:42:08,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-17 20:42:18,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3508120.0, ans=0.2 2024-08-17 20:42:21,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3508120.0, ans=0.125 2024-08-17 20:42:29,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3508120.0, ans=0.0 2024-08-17 20:42:30,860 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-17 20:43:00,593 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-17 20:43:02,384 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7700, loss[loss=0.1039, beats_loss=0.0121, ecapa_loss=0.0001162, whisper_loss=0.09063, over 23816.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001487, whisper_loss=0.09039, over 3880588.27 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:43:06,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3508420.0, ans=0.0 2024-08-17 20:43:25,552 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-17 20:43:36,170 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 20:43:59,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3508720.0, ans=0.125 2024-08-17 20:44:00,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.93 vs. limit=10.0 2024-08-17 20:44:20,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7750, loss[loss=0.1063, beats_loss=0.01019, ecapa_loss=0.0001611, whisper_loss=0.09445, over 21864.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.000148, whisper_loss=0.09032, over 3890605.13 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:44:23,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.352e+01 2.581e+01 3.039e+01 8.036e+01, threshold=5.163e+01, percent-clipped=1.0 2024-08-17 20:44:54,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3509120.0, ans=0.125 2024-08-17 20:44:57,804 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 20:45:09,973 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 20:45:13,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.45 vs. limit=15.0 2024-08-17 20:45:14,596 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-17 20:45:31,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3509320.0, ans=0.2 2024-08-17 20:45:32,385 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 20:45:36,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7800, loss[loss=0.1084, beats_loss=0.01226, ecapa_loss=0.0001196, whisper_loss=0.09495, over 23490.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001475, whisper_loss=0.09034, over 3881278.52 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:45:43,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3509420.0, ans=0.125 2024-08-17 20:45:44,117 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-17 20:46:04,710 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-17 20:46:08,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3509620.0, ans=0.125 2024-08-17 20:46:31,659 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-17 20:46:32,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3509720.0, ans=0.125 2024-08-17 20:46:36,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3509820.0, ans=0.125 2024-08-17 20:46:39,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3509820.0, ans=0.125 2024-08-17 20:46:41,807 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-17 20:46:51,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7850, loss[loss=0.09166, beats_loss=0.01227, ecapa_loss=0.0001606, whisper_loss=0.07779, over 14217.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001472, whisper_loss=0.0902, over 3888665.52 frames. ], batch size: 62, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:46:54,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.319e+01 2.575e+01 2.873e+01 4.382e+02, threshold=5.150e+01, percent-clipped=1.0 2024-08-17 20:47:17,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3510020.0, ans=0.2 2024-08-17 20:47:37,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3510220.0, ans=0.125 2024-08-17 20:47:57,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3510320.0, ans=0.125 2024-08-17 20:47:57,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3510320.0, ans=0.2 2024-08-17 20:48:03,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7900, loss[loss=0.1066, beats_loss=0.01142, ecapa_loss=0.0001487, whisper_loss=0.09371, over 22752.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001472, whisper_loss=0.09011, over 3914288.18 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:48:08,723 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-17 20:48:13,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3510420.0, ans=0.2 2024-08-17 20:48:18,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3510520.0, ans=0.07 2024-08-17 20:48:24,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3510520.0, ans=0.05 2024-08-17 20:48:30,334 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-17 20:48:41,667 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 40 from LS+wenet, 9 from Vox, 44 fro AS 2024-08-17 20:48:43,100 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-17 20:48:44,521 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 20:48:46,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=15.0 2024-08-17 20:48:48,974 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-17 20:48:49,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3510720.0, ans=0.125 2024-08-17 20:49:05,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3510820.0, ans=0.125 2024-08-17 20:49:05,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3510820.0, ans=0.2 2024-08-17 20:49:09,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3510820.0, ans=0.0 2024-08-17 20:49:14,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 7950, loss[loss=0.0903, beats_loss=0.01014, ecapa_loss=0.0001498, whisper_loss=0.07866, over 22515.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01065, ecapa_loss=0.0001475, whisper_loss=0.09026, over 3918534.65 frames. ], batch size: 93, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:49:16,455 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.370e+01 2.553e+01 2.861e+01 6.638e+01, threshold=5.106e+01, percent-clipped=2.0 2024-08-17 20:49:17,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3510920.0, ans=0.125 2024-08-17 20:49:17,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3510920.0, ans=0.125 2024-08-17 20:49:26,707 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-17 20:49:36,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3511020.0, ans=0.2 2024-08-17 20:49:40,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3511020.0, ans=0.1 2024-08-17 20:49:44,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3511120.0, ans=0.0 2024-08-17 20:49:52,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3511120.0, ans=0.0 2024-08-17 20:50:02,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3511220.0, ans=0.125 2024-08-17 20:50:04,451 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:50:26,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8000, loss[loss=0.09972, beats_loss=0.01106, ecapa_loss=0.000172, whisper_loss=0.08694, over 18029.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01061, ecapa_loss=0.0001469, whisper_loss=0.09086, over 3919031.61 frames. ], batch size: 72, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:50:32,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3511420.0, ans=0.1 2024-08-17 20:50:33,516 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 38 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-17 20:50:44,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3511520.0, ans=0.0 2024-08-17 20:50:49,579 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 20:50:54,102 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-17 20:51:06,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3511620.0, ans=0.0 2024-08-17 20:51:18,534 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 20:51:33,621 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-17 20:51:40,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8050, loss[loss=0.1264, beats_loss=0.0105, ecapa_loss=0.0001016, whisper_loss=0.1149, over 21306.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01057, ecapa_loss=0.0001477, whisper_loss=0.09163, over 3923952.09 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:51:43,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3511920.0, ans=0.0 2024-08-17 20:51:44,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.251e+01 2.590e+01 2.848e+01 4.049e+01, threshold=5.180e+01, percent-clipped=0.0 2024-08-17 20:51:52,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.44 vs. limit=10.0 2024-08-17 20:52:04,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.28 vs. limit=12.0 2024-08-17 20:52:10,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3512120.0, ans=0.0 2024-08-17 20:52:23,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3512220.0, ans=0.125 2024-08-17 20:52:26,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2024-08-17 20:52:34,019 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 20:52:37,833 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-17 20:52:39,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3512320.0, ans=0.2 2024-08-17 20:52:48,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3512420.0, ans=0.0 2024-08-17 20:52:49,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8100, loss[loss=0.09349, beats_loss=0.01187, ecapa_loss=0.0001359, whisper_loss=0.08026, over 15436.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01062, ecapa_loss=0.000146, whisper_loss=0.09069, over 3892099.03 frames. ], batch size: 62, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:52:57,984 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 20:52:59,495 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0995599552989006, model_norm_threshold=51.80342102050781 2024-08-17 20:52:59,663 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.42, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.127e+05, grad_sumsq=1.103e+07, orig_rms_sq=1.022e-02 2024-08-17 20:53:01,270 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 20:53:14,226 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 20:53:17,170 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 20:53:18,666 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-17 20:53:25,009 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09344208240509033, model_norm_threshold=51.80342102050781 2024-08-17 20:53:25,471 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.conv_module1.depthwise_conv.causal_conv.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.259e+04, grad_sumsq=1.018e+05, orig_rms_sq=6.150e-01 2024-08-17 20:53:26,685 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-17 20:53:28,275 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 20:53:44,100 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 20:53:59,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8150, loss[loss=0.1044, beats_loss=0.01096, ecapa_loss=0.0001478, whisper_loss=0.09191, over 15103.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001462, whisper_loss=0.09101, over 3890623.52 frames. ], batch size: 60, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:54:02,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.389e+01 2.629e+01 2.995e+01 5.544e+02, threshold=5.257e+01, percent-clipped=3.0 2024-08-17 20:54:18,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2024-08-17 20:54:21,703 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 20:54:27,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3513120.0, ans=0.035 2024-08-17 20:54:29,773 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-17 20:54:33,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3513120.0, ans=0.0 2024-08-17 20:54:37,237 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-17 20:54:44,656 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-17 20:54:45,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3513220.0, ans=0.125 2024-08-17 20:54:50,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3513220.0, ans=0.0 2024-08-17 20:55:08,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8200, loss[loss=0.09908, beats_loss=0.00973, ecapa_loss=0.0001598, whisper_loss=0.08775, over 21864.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001469, whisper_loss=0.09036, over 3880456.74 frames. ], batch size: 93, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:55:13,825 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 20:55:19,599 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 20:55:21,694 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-17 20:55:25,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3513520.0, ans=0.1 2024-08-17 20:55:35,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3513620.0, ans=0.1 2024-08-17 20:56:14,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8250, loss[loss=0.08737, beats_loss=0.01062, ecapa_loss=0.0001461, whisper_loss=0.07528, over 17854.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001476, whisper_loss=0.09046, over 3864868.98 frames. ], batch size: 69, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:56:17,217 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.363e+01 2.592e+01 2.897e+01 5.680e+01, threshold=5.184e+01, percent-clipped=1.0 2024-08-17 20:56:18,794 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 20:56:22,101 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-17 20:56:27,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3514020.0, ans=0.2 2024-08-17 20:56:33,940 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 20:56:43,160 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 20:56:44,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3514120.0, ans=0.2 2024-08-17 20:56:52,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3514120.0, ans=0.125 2024-08-17 20:57:02,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3514220.0, ans=0.125 2024-08-17 20:57:12,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3514320.0, ans=0.125 2024-08-17 20:57:13,041 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-17 20:57:19,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8300, loss[loss=0.08836, beats_loss=0.0112, ecapa_loss=0.0001334, whisper_loss=0.07582, over 15752.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001462, whisper_loss=0.09041, over 3858116.33 frames. ], batch size: 64, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:57:23,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-17 20:57:33,167 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 20:57:49,544 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-17 20:57:50,844 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 20:57:59,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2024-08-17 20:58:11,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3514720.0, ans=0.1 2024-08-17 20:58:22,062 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 20:58:28,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8350, loss[loss=0.1005, beats_loss=0.01104, ecapa_loss=0.0001476, whisper_loss=0.08797, over 23206.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001466, whisper_loss=0.09063, over 3864545.98 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:58:32,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.299e+01 2.560e+01 2.760e+01 1.780e+02, threshold=5.121e+01, percent-clipped=1.0 2024-08-17 20:58:37,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3514920.0, ans=0.125 2024-08-17 20:58:52,642 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 20:58:56,863 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-17 20:58:58,018 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-17 20:58:59,617 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.61 vs. limit=22.5 2024-08-17 20:59:04,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3515120.0, ans=0.04949747468305833 2024-08-17 20:59:10,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3515220.0, ans=0.125 2024-08-17 20:59:30,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3515320.0, ans=0.1 2024-08-17 20:59:36,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3515320.0, ans=0.125 2024-08-17 20:59:40,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8400, loss[loss=0.1095, beats_loss=0.01151, ecapa_loss=8.515e-05, whisper_loss=0.09713, over 15520.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001459, whisper_loss=0.09083, over 3869432.70 frames. ], batch size: 55, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 20:59:41,801 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 20:59:53,286 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 20:59:56,026 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-17 21:00:03,478 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-17 21:00:28,762 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-17 21:00:48,567 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8450, loss[loss=0.09608, beats_loss=0.00993, ecapa_loss=0.0001798, whisper_loss=0.08435, over 15026.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001469, whisper_loss=0.09095, over 3878169.97 frames. ], batch size: 63, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:00:51,503 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.340e+01 2.576e+01 2.813e+01 3.735e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-17 21:00:52,915 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 21:01:04,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3516020.0, ans=0.125 2024-08-17 21:01:10,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3516020.0, ans=0.125 2024-08-17 21:01:27,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3516120.0, ans=0.125 2024-08-17 21:01:37,260 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-17 21:01:39,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-17 21:01:41,704 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 21:01:49,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.25 vs. limit=22.5 2024-08-17 21:01:53,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.47 vs. limit=22.5 2024-08-17 21:01:59,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8500, loss[loss=0.08416, beats_loss=0.01249, ecapa_loss=0.0001559, whisper_loss=0.07011, over 15510.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.000146, whisper_loss=0.09084, over 3883971.44 frames. ], batch size: 63, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:02:02,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3516420.0, ans=0.1 2024-08-17 21:02:18,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3516520.0, ans=0.2 2024-08-17 21:02:21,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3516520.0, ans=0.125 2024-08-17 21:02:39,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3516620.0, ans=0.0 2024-08-17 21:03:06,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3516820.0, ans=0.1 2024-08-17 21:03:12,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8550, loss[loss=0.1004, beats_loss=0.01043, ecapa_loss=0.0001653, whisper_loss=0.08832, over 22344.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01051, ecapa_loss=0.0001457, whisper_loss=0.0905, over 3896184.12 frames. ], batch size: 92, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:03:13,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3516920.0, ans=10.0 2024-08-17 21:03:16,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.310e+01 2.634e+01 2.977e+01 2.577e+02, threshold=5.269e+01, percent-clipped=4.0 2024-08-17 21:03:29,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3517020.0, ans=0.0 2024-08-17 21:03:43,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3517120.0, ans=0.125 2024-08-17 21:03:47,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.76 vs. limit=15.0 2024-08-17 21:03:58,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3517220.0, ans=0.1 2024-08-17 21:03:59,611 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-17 21:04:07,351 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 21:04:14,952 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 21:04:21,569 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-17 21:04:24,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8600, loss[loss=0.09297, beats_loss=0.0136, ecapa_loss=0.0001028, whisper_loss=0.07834, over 21762.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001461, whisper_loss=0.09001, over 3913403.69 frames. ], batch size: 86, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:04:26,877 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=12.0 2024-08-17 21:04:30,712 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-17 21:04:45,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2024-08-17 21:04:48,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3517520.0, ans=0.125 2024-08-17 21:04:50,920 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-17 21:04:53,702 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-17 21:05:03,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3517620.0, ans=0.125 2024-08-17 21:05:20,502 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 21:05:23,811 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-17 21:05:30,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3517820.0, ans=22.5 2024-08-17 21:05:36,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8650, loss[loss=0.103, beats_loss=0.01109, ecapa_loss=0.0001495, whisper_loss=0.09043, over 23217.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01068, ecapa_loss=0.0001481, whisper_loss=0.08951, over 3860343.59 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:05:39,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.019e+01 2.385e+01 2.642e+01 2.981e+01 2.241e+02, threshold=5.284e+01, percent-clipped=1.0 2024-08-17 21:05:51,273 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-17 21:05:54,439 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.072e-03 2024-08-17 21:05:56,265 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 21:06:32,905 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 21:06:33,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3518320.0, ans=0.0 2024-08-17 21:06:33,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3518320.0, ans=0.0 2024-08-17 21:06:35,594 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-17 21:06:49,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8700, loss[loss=0.09119, beats_loss=0.01041, ecapa_loss=0.0001513, whisper_loss=0.07927, over 20071.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01072, ecapa_loss=0.0001477, whisper_loss=0.08943, over 3857337.78 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:06:49,673 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-17 21:07:02,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3518420.0, ans=0.125 2024-08-17 21:07:19,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3518620.0, ans=0.0 2024-08-17 21:07:39,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3518720.0, ans=0.0 2024-08-17 21:07:41,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3518720.0, ans=0.125 2024-08-17 21:07:42,044 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 21:07:44,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3518720.0, ans=0.125 2024-08-17 21:07:46,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3518720.0, ans=0.0 2024-08-17 21:07:56,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=3518820.0, ans=0.2 2024-08-17 21:08:03,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8750, loss[loss=0.1016, beats_loss=0.01211, ecapa_loss=0.0001545, whisper_loss=0.08796, over 21917.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.000148, whisper_loss=0.09014, over 3885237.61 frames. ], batch size: 89, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:08:07,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.308e+01 2.508e+01 2.741e+01 1.105e+02, threshold=5.017e+01, percent-clipped=1.0 2024-08-17 21:08:13,651 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 21:08:22,442 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-17 21:08:27,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3519020.0, ans=0.1 2024-08-17 21:08:37,613 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 21:08:40,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3519120.0, ans=0.125 2024-08-17 21:08:42,889 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-17 21:08:59,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2024-08-17 21:09:20,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8800, loss[loss=0.115, beats_loss=0.009648, ecapa_loss=0.0001632, whisper_loss=0.1037, over 16128.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001476, whisper_loss=0.09072, over 3889228.86 frames. ], batch size: 62, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:09:21,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3519420.0, ans=0.0 2024-08-17 21:09:24,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3519420.0, ans=0.5 2024-08-17 21:09:38,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3519520.0, ans=0.125 2024-08-17 21:09:41,263 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 21:09:50,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3519620.0, ans=0.1 2024-08-17 21:10:11,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3519720.0, ans=0.2 2024-08-17 21:10:34,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8850, loss[loss=0.08954, beats_loss=0.01031, ecapa_loss=0.0001458, whisper_loss=0.07777, over 16797.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001481, whisper_loss=0.09041, over 3885641.78 frames. ], batch size: 68, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:10:36,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3519920.0, ans=0.0 2024-08-17 21:10:37,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.941e+01 2.338e+01 2.557e+01 2.876e+01 3.818e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-17 21:10:39,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3519920.0, ans=0.0 2024-08-17 21:10:59,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3520020.0, ans=0.2 2024-08-17 21:11:06,559 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 21:11:12,462 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-17 21:11:25,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3520220.0, ans=0.1 2024-08-17 21:11:42,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3520320.0, ans=0.0 2024-08-17 21:11:45,375 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-17 21:11:48,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3520320.0, ans=0.1 2024-08-17 21:11:50,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8900, loss[loss=0.1047, beats_loss=0.01031, ecapa_loss=0.0001559, whisper_loss=0.09283, over 22927.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001474, whisper_loss=0.08963, over 3858837.74 frames. ], batch size: 95, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:12:09,294 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 21:12:11,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3520520.0, ans=0.125 2024-08-17 21:12:16,694 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 41 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-17 21:12:24,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3520620.0, ans=0.1 2024-08-17 21:12:36,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3520720.0, ans=0.0 2024-08-17 21:12:42,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-08-17 21:12:47,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3520720.0, ans=0.1 2024-08-17 21:12:51,268 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 21:12:54,285 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-17 21:13:00,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3520820.0, ans=0.2 2024-08-17 21:13:05,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3520820.0, ans=0.125 2024-08-17 21:13:07,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 8950, loss[loss=0.1033, beats_loss=0.0119, ecapa_loss=0.0001352, whisper_loss=0.09008, over 22164.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01069, ecapa_loss=0.0001464, whisper_loss=0.08961, over 3857276.44 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:13:10,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.279e+01 2.513e+01 2.850e+01 4.067e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-17 21:13:36,677 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 21:14:03,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3521220.0, ans=0.125 2024-08-17 21:14:18,178 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-17 21:14:21,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3521320.0, ans=0.125 2024-08-17 21:14:24,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3521320.0, ans=0.0 2024-08-17 21:14:26,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9000, loss[loss=0.09332, beats_loss=0.009734, ecapa_loss=0.0001485, whisper_loss=0.0821, over 17358.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001472, whisper_loss=0.08961, over 3855182.12 frames. ], batch size: 68, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:14:26,717 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 21:15:03,711 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on ASR_libri: loss=0.2507, beats_loss=0, ecapa_loss=0.0005281, whisper_loss=0.2454, over 922467.00 frames. 2024-08-17 21:15:12,434 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.2673, 1.5859, 2.0593, 1.0986, 1.3436, 1.5769, 1.9952, 1.9601], device='cuda:3') 2024-08-17 21:15:22,182 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on SV_voxceleb1: loss=0.004114, beats_loss=0, ecapa_loss=0.0004114, whisper_loss=0, over 939242.00 frames. 2024-08-17 21:16:53,445 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4884, 2.4182, 3.1153, 1.3700], device='cuda:3') 2024-08-17 21:17:01,739 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on AT_audioset: loss=0.02322, beats_loss=0.02322, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 21:17:01,743 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 21:17:06,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=15.0 2024-08-17 21:17:13,372 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 21:17:15,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3521520.0, ans=0.125 2024-08-17 21:17:24,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-08-17 21:17:29,952 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-17 21:17:47,438 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 21:17:52,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3521720.0, ans=0.0 2024-08-17 21:17:56,779 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-17 21:18:17,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9050, loss[loss=0.1044, beats_loss=0.01046, ecapa_loss=0.0001397, whisper_loss=0.09259, over 15731.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001476, whisper_loss=0.08994, over 3857553.66 frames. ], batch size: 60, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:18:21,881 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.405e+01 2.622e+01 2.954e+01 2.025e+02, threshold=5.245e+01, percent-clipped=2.0 2024-08-17 21:18:22,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3521920.0, ans=0.1 2024-08-17 21:18:41,287 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2024-08-17 21:18:57,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3522120.0, ans=0.1 2024-08-17 21:19:06,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3522220.0, ans=0.125 2024-08-17 21:19:19,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2024-08-17 21:19:19,601 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-17 21:19:25,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2024-08-17 21:19:32,725 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 14 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 21:19:38,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9100, loss[loss=0.1088, beats_loss=0.007758, ecapa_loss=0.0001645, whisper_loss=0.09943, over 22770.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001482, whisper_loss=0.09041, over 3871901.21 frames. ], batch size: 90, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:19:45,363 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:19:51,498 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-17 21:20:01,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3522520.0, ans=0.125 2024-08-17 21:20:06,079 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-17 21:20:30,543 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-17 21:20:33,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3522720.0, ans=0.0 2024-08-17 21:20:43,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3522820.0, ans=0.125 2024-08-17 21:20:43,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.31 vs. limit=15.0 2024-08-17 21:20:46,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-17 21:20:51,040 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9150, loss[loss=0.09042, beats_loss=0.01169, ecapa_loss=0.0001251, whisper_loss=0.07748, over 18161.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001483, whisper_loss=0.09006, over 3881866.49 frames. ], batch size: 71, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:20:54,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.307e+01 2.548e+01 2.836e+01 3.815e+01, threshold=5.097e+01, percent-clipped=0.0 2024-08-17 21:20:54,148 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 21:21:21,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3523120.0, ans=0.5 2024-08-17 21:21:22,168 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-17 21:21:28,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3523120.0, ans=0.0 2024-08-17 21:21:53,510 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 21:22:01,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9200, loss[loss=0.1072, beats_loss=0.0116, ecapa_loss=0.0001734, whisper_loss=0.09392, over 22187.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001474, whisper_loss=0.08999, over 3892216.73 frames. ], batch size: 94, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:22:05,106 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-17 21:22:06,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3523420.0, ans=0.125 2024-08-17 21:22:09,085 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 21:22:14,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3523520.0, ans=0.125 2024-08-17 21:22:33,266 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 21:22:43,070 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 21:22:45,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3523720.0, ans=0.04949747468305833 2024-08-17 21:22:48,747 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 21:22:56,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3523820.0, ans=0.0 2024-08-17 21:22:57,849 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-17 21:23:04,014 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-17 21:23:05,116 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 21:23:06,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9250, loss[loss=0.1175, beats_loss=0.01007, ecapa_loss=0.0001643, whisper_loss=0.1058, over 17150.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001483, whisper_loss=0.08994, over 3877405.67 frames. ], batch size: 70, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:23:08,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.350e+01 2.657e+01 3.037e+01 4.188e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-17 21:23:10,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3523920.0, ans=0.1 2024-08-17 21:23:11,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3523920.0, ans=0.125 2024-08-17 21:23:11,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3523920.0, ans=0.0 2024-08-17 21:23:30,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3524020.0, ans=0.1 2024-08-17 21:23:30,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3524020.0, ans=0.125 2024-08-17 21:23:39,940 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2024-08-17 21:23:47,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=12.0 2024-08-17 21:23:52,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3524220.0, ans=0.125 2024-08-17 21:24:13,198 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9300, loss[loss=0.0869, beats_loss=0.01163, ecapa_loss=0.0001312, whisper_loss=0.07396, over 19229.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001479, whisper_loss=0.09015, over 3896421.03 frames. ], batch size: 79, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:24:23,675 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-17 21:24:35,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3524520.0, ans=0.0 2024-08-17 21:24:36,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3524520.0, ans=0.0 2024-08-17 21:24:37,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3524620.0, ans=0.125 2024-08-17 21:24:55,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3524720.0, ans=0.0 2024-08-17 21:25:18,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9350, loss[loss=0.097, beats_loss=0.009378, ecapa_loss=0.0001496, whisper_loss=0.08612, over 21066.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001478, whisper_loss=0.08985, over 3907898.69 frames. ], batch size: 82, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:25:21,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.336e+01 2.549e+01 2.795e+01 4.217e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-17 21:25:30,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3524920.0, ans=10.0 2024-08-17 21:25:32,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3525020.0, ans=0.2 2024-08-17 21:25:33,200 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 21:25:35,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2024-08-17 21:26:01,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3525220.0, ans=0.2 2024-08-17 21:26:07,854 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.41 vs. limit=15.0 2024-08-17 21:26:07,975 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-17 21:26:18,480 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 21:26:22,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2024-08-17 21:26:27,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9400, loss[loss=0.09348, beats_loss=0.01114, ecapa_loss=0.0001657, whisper_loss=0.08069, over 21044.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.08939, over 3921339.92 frames. ], batch size: 93, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:26:28,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3525420.0, ans=0.025 2024-08-17 21:26:33,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3525420.0, ans=0.125 2024-08-17 21:26:39,241 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 21:26:47,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2024-08-17 21:26:58,350 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 21:27:07,875 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 12 from Vox, 53 fro AS 2024-08-17 21:27:11,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3525720.0, ans=0.1 2024-08-17 21:27:12,184 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 21:27:12,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3525720.0, ans=0.2 2024-08-17 21:27:21,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-17 21:27:25,425 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-17 21:27:37,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9450, loss[loss=0.1135, beats_loss=0.01069, ecapa_loss=0.0001282, whisper_loss=0.1015, over 23429.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01071, ecapa_loss=0.0001474, whisper_loss=0.08887, over 3888620.01 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:27:41,294 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.632e+01 2.404e+01 2.620e+01 3.015e+01 5.071e+01, threshold=5.241e+01, percent-clipped=0.0 2024-08-17 21:27:41,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3525920.0, ans=0.125 2024-08-17 21:27:43,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3525920.0, ans=0.0 2024-08-17 21:27:55,794 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 21:28:01,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3526020.0, ans=0.125 2024-08-17 21:28:10,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.57 vs. limit=15.0 2024-08-17 21:28:13,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3526120.0, ans=0.1 2024-08-17 21:28:15,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3526120.0, ans=0.0 2024-08-17 21:28:15,971 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-17 21:28:20,167 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-17 21:28:26,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.72 vs. limit=15.0 2024-08-17 21:28:34,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3526320.0, ans=0.09899494936611666 2024-08-17 21:28:38,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3526320.0, ans=0.125 2024-08-17 21:28:39,596 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 21:28:48,272 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-17 21:28:49,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9500, loss[loss=0.1109, beats_loss=0.011, ecapa_loss=0.0001265, whisper_loss=0.09866, over 23031.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01068, ecapa_loss=0.0001473, whisper_loss=0.08887, over 3848061.06 frames. ], batch size: 91, lr: 2.55e-03, grad_scale: 1.152921504606847e+18 2024-08-17 21:29:00,980 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 21:29:07,992 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 21:29:10,027 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:29:13,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3526520.0, ans=0.1 2024-08-17 21:29:17,655 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-17 21:29:32,733 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 21:29:58,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3526820.0, ans=0.125 2024-08-17 21:30:10,016 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9550, loss[loss=0.1095, beats_loss=0.01048, ecapa_loss=0.0001745, whisper_loss=0.09727, over 22282.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001474, whisper_loss=0.0894, over 3849974.55 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:30:11,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3526920.0, ans=0.0 2024-08-17 21:30:15,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.375e+01 2.611e+01 2.915e+01 4.364e+01, threshold=5.222e+01, percent-clipped=0.0 2024-08-17 21:30:16,326 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-17 21:30:16,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3526920.0, ans=0.1 2024-08-17 21:31:03,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3527120.0, ans=0.125 2024-08-17 21:31:05,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3527220.0, ans=0.0 2024-08-17 21:31:17,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3527220.0, ans=0.1 2024-08-17 21:31:23,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=15.0 2024-08-17 21:31:29,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3527320.0, ans=10.0 2024-08-17 21:31:40,200 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-17 21:31:44,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9600, loss[loss=0.09039, beats_loss=0.01192, ecapa_loss=0.000109, whisper_loss=0.07739, over 17453.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.08995, over 3844126.43 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:31:44,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3527420.0, ans=0.125 2024-08-17 21:31:49,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3527420.0, ans=0.125 2024-08-17 21:31:51,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3527420.0, ans=0.2 2024-08-17 21:31:52,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3527420.0, ans=0.125 2024-08-17 21:31:53,200 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-17 21:31:54,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2024-08-17 21:32:22,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.02 vs. limit=22.5 2024-08-17 21:32:24,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3527620.0, ans=0.125 2024-08-17 21:32:33,636 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-17 21:32:33,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3527620.0, ans=0.5 2024-08-17 21:33:02,301 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 21:33:08,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3527820.0, ans=0.125 2024-08-17 21:33:23,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9650, loss[loss=0.0976, beats_loss=0.01198, ecapa_loss=0.0001238, whisper_loss=0.08439, over 21977.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001486, whisper_loss=0.09006, over 3852170.92 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:33:25,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3527920.0, ans=0.125 2024-08-17 21:33:29,200 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.363e+01 2.592e+01 2.956e+01 8.123e+01, threshold=5.183e+01, percent-clipped=2.0 2024-08-17 21:33:34,950 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-17 21:33:54,078 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-17 21:34:01,623 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:34:13,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3528120.0, ans=0.0 2024-08-17 21:34:20,945 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 21:34:32,298 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 21:34:42,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3528320.0, ans=0.2 2024-08-17 21:34:44,755 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 21:34:50,587 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-17 21:35:00,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3528320.0, ans=10.0 2024-08-17 21:35:04,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9700, loss[loss=0.1064, beats_loss=0.009927, ecapa_loss=0.0001252, whisper_loss=0.09526, over 20359.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001487, whisper_loss=0.09059, over 3834449.80 frames. ], batch size: 79, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:35:17,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3528420.0, ans=0.0 2024-08-17 21:35:20,555 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-17 21:35:26,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3528520.0, ans=0.125 2024-08-17 21:35:33,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3528620.0, ans=0.125 2024-08-17 21:35:47,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-08-17 21:36:15,853 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 21:36:16,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9750, loss[loss=0.1135, beats_loss=0.009704, ecapa_loss=0.0001391, whisper_loss=0.1024, over 23321.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001488, whisper_loss=0.09021, over 3817616.83 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:36:18,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3528920.0, ans=0.0 2024-08-17 21:36:20,771 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.350e+01 2.623e+01 3.001e+01 4.380e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-17 21:36:40,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=3529020.0, ans=0.5 2024-08-17 21:36:49,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.82 vs. limit=12.0 2024-08-17 21:36:54,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3529120.0, ans=0.125 2024-08-17 21:37:09,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3529220.0, ans=0.125 2024-08-17 21:37:20,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3529320.0, ans=0.125 2024-08-17 21:37:26,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3529320.0, ans=0.125 2024-08-17 21:37:30,225 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9800, loss[loss=0.1046, beats_loss=0.01179, ecapa_loss=0.0001336, whisper_loss=0.09145, over 14084.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.000148, whisper_loss=0.08991, over 3797307.84 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:37:37,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3529420.0, ans=0.0 2024-08-17 21:38:04,068 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-17 21:38:12,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3529620.0, ans=0.0 2024-08-17 21:38:22,329 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-17 21:38:22,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3529720.0, ans=0.125 2024-08-17 21:38:35,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3529820.0, ans=0.0 2024-08-17 21:38:36,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3529820.0, ans=0.125 2024-08-17 21:38:48,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9850, loss[loss=0.1038, beats_loss=0.009766, ecapa_loss=0.0001516, whisper_loss=0.09256, over 14011.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001473, whisper_loss=0.09016, over 3835293.41 frames. ], batch size: 55, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:38:52,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.238e+01 2.528e+01 2.791e+01 4.527e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-17 21:38:55,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3529920.0, ans=0.125 2024-08-17 21:39:01,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3530020.0, ans=0.125 2024-08-17 21:39:07,636 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 21:39:33,328 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 21:39:53,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3530320.0, ans=0.1 2024-08-17 21:39:53,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3530320.0, ans=0.1 2024-08-17 21:39:57,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2024-08-17 21:40:04,751 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9900, loss[loss=0.132, beats_loss=0.007757, ecapa_loss=0.000175, whisper_loss=0.1225, over 22161.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001473, whisper_loss=0.09068, over 3882542.45 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:40:04,914 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 21:40:06,533 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-17 21:40:11,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3530420.0, ans=0.125 2024-08-17 21:40:13,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-17 21:40:14,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3530420.0, ans=0.09899494936611666 2024-08-17 21:40:19,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3530520.0, ans=0.0 2024-08-17 21:40:48,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.41 vs. limit=15.0 2024-08-17 21:40:52,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-08-17 21:41:18,117 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.44 vs. limit=12.0 2024-08-17 21:41:19,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 9950, loss[loss=0.1063, beats_loss=0.01034, ecapa_loss=0.0001195, whisper_loss=0.09477, over 19278.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001472, whisper_loss=0.09008, over 3863268.60 frames. ], batch size: 74, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:41:20,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.56 vs. limit=15.0 2024-08-17 21:41:21,189 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-17 21:41:23,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.318e+01 2.487e+01 2.825e+01 4.305e+01, threshold=4.974e+01, percent-clipped=0.0 2024-08-17 21:41:25,326 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-17 21:41:26,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3530920.0, ans=0.04949747468305833 2024-08-17 21:41:34,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2024-08-17 21:41:35,259 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:41:35,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3531020.0, ans=0.125 2024-08-17 21:42:15,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3531220.0, ans=0.0 2024-08-17 21:42:37,666 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.904e-01 2024-08-17 21:42:38,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10000, loss[loss=0.1009, beats_loss=0.0101, ecapa_loss=0.0001878, whisper_loss=0.08888, over 19991.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001486, whisper_loss=0.09001, over 3864374.20 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:42:43,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3531420.0, ans=0.0 2024-08-17 21:43:08,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3531520.0, ans=0.0 2024-08-17 21:43:14,026 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 21:43:32,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3531720.0, ans=0.1 2024-08-17 21:43:32,899 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-17 21:43:49,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3531820.0, ans=0.0 2024-08-17 21:43:54,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10050, loss[loss=0.09117, beats_loss=0.008253, ecapa_loss=0.0001946, whisper_loss=0.08097, over 18187.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001485, whisper_loss=0.08999, over 3875043.62 frames. ], batch size: 77, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:43:59,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.404e+01 2.606e+01 2.788e+01 4.457e+01, threshold=5.212e+01, percent-clipped=0.0 2024-08-17 21:44:06,674 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-08-17 21:44:09,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3532020.0, ans=0.1 2024-08-17 21:44:20,207 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 32 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-17 21:44:23,030 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 21:44:31,085 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-17 21:44:39,850 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09485381096601486, model_norm_threshold=52.118003845214844 2024-08-17 21:44:40,019 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.28, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.481e+04, grad_sumsq=8.481e+04, orig_rms_sq=1.000e+00 2024-08-17 21:44:41,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3532220.0, ans=0.125 2024-08-17 21:44:49,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3532220.0, ans=0.1 2024-08-17 21:44:57,414 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 21:45:09,527 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 10 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-17 21:45:12,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10100, loss[loss=0.104, beats_loss=0.01013, ecapa_loss=0.0001435, whisper_loss=0.09243, over 18377.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001468, whisper_loss=0.09062, over 3903008.62 frames. ], batch size: 72, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:45:28,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3532520.0, ans=0.1 2024-08-17 21:45:33,572 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 21:45:35,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.15 vs. limit=15.0 2024-08-17 21:45:58,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3532720.0, ans=0.0 2024-08-17 21:46:00,494 INFO [train_multi_KD3.py:844] (3/4) A total of 98 cuts. 29 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-17 21:46:26,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10150, loss[loss=0.1092, beats_loss=0.01119, ecapa_loss=0.0001438, whisper_loss=0.0966, over 21651.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.09017, over 3904801.68 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:46:26,393 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 21:46:30,188 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.357e+01 2.542e+01 2.923e+01 5.495e+02, threshold=5.084e+01, percent-clipped=3.0 2024-08-17 21:46:50,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3533020.0, ans=0.0 2024-08-17 21:46:57,885 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:47:03,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3533120.0, ans=0.125 2024-08-17 21:47:09,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3533220.0, ans=10.0 2024-08-17 21:47:12,265 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 21:47:17,630 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-17 21:47:17,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-08-17 21:47:26,194 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 21:47:28,753 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 21:47:35,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-17 21:47:37,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10200, loss[loss=0.09756, beats_loss=0.01175, ecapa_loss=0.0001506, whisper_loss=0.08431, over 21072.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.000147, whisper_loss=0.09044, over 3910406.52 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:47:37,941 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 20 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-17 21:47:38,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3533420.0, ans=0.2 2024-08-17 21:47:41,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3533420.0, ans=0.125 2024-08-17 21:47:51,562 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-17 21:48:49,020 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10250, loss[loss=0.07724, beats_loss=0.01239, ecapa_loss=0.0001978, whisper_loss=0.06288, over 20507.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001479, whisper_loss=0.09074, over 3933579.11 frames. ], batch size: 93, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:48:49,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3533920.0, ans=0.05 2024-08-17 21:48:52,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3533920.0, ans=0.0 2024-08-17 21:48:52,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.355e+01 2.589e+01 2.999e+01 4.439e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-17 21:49:08,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3534020.0, ans=0.1 2024-08-17 21:49:09,122 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-17 21:49:17,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3534120.0, ans=0.125 2024-08-17 21:49:19,477 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.70 vs. limit=10.0 2024-08-17 21:49:21,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3534120.0, ans=0.0 2024-08-17 21:49:29,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3534220.0, ans=0.0 2024-08-17 21:49:45,773 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-17 21:49:47,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3534320.0, ans=0.2 2024-08-17 21:49:54,284 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 21:49:57,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10300, loss[loss=0.1078, beats_loss=0.009075, ecapa_loss=0.0001689, whisper_loss=0.09707, over 14663.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001468, whisper_loss=0.09043, over 3924358.81 frames. ], batch size: 60, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:49:59,020 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 21:50:16,127 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 21:50:35,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3534620.0, ans=0.0 2024-08-17 21:50:48,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-08-17 21:50:58,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3534820.0, ans=0.0 2024-08-17 21:51:05,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10350, loss[loss=0.08854, beats_loss=0.009742, ecapa_loss=0.0001584, whisper_loss=0.07721, over 18031.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001477, whisper_loss=0.09051, over 3908989.86 frames. ], batch size: 74, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:51:07,418 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-17 21:51:09,954 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.272e+01 2.511e+01 2.861e+01 6.288e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-17 21:51:27,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3535020.0, ans=0.0 2024-08-17 21:51:30,194 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-17 21:51:45,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3535220.0, ans=0.125 2024-08-17 21:51:57,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3535220.0, ans=0.1 2024-08-17 21:52:01,345 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-17 21:52:03,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3535320.0, ans=0.125 2024-08-17 21:52:06,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3535320.0, ans=0.2 2024-08-17 21:52:07,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3535320.0, ans=0.125 2024-08-17 21:52:08,246 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-17 21:52:08,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3535320.0, ans=0.125 2024-08-17 21:52:13,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10400, loss[loss=0.1082, beats_loss=0.008809, ecapa_loss=0.0001445, whisper_loss=0.0979, over 17409.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001481, whisper_loss=0.09001, over 3909980.99 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:52:16,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3535420.0, ans=0.2 2024-08-17 21:52:19,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3535420.0, ans=0.125 2024-08-17 21:52:23,651 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-17 21:52:30,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3535520.0, ans=10.0 2024-08-17 21:52:32,737 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-17 21:52:35,630 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-17 21:52:38,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3535520.0, ans=0.2 2024-08-17 21:52:40,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3535620.0, ans=0.125 2024-08-17 21:52:50,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3535620.0, ans=0.1 2024-08-17 21:52:52,522 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-17 21:53:01,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 21:53:02,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3535720.0, ans=0.2 2024-08-17 21:53:19,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10450, loss[loss=0.1115, beats_loss=0.0104, ecapa_loss=0.0001295, whisper_loss=0.0998, over 19799.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.09046, over 3894581.16 frames. ], batch size: 79, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:53:22,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.244e+01 2.463e+01 2.760e+01 5.655e+01, threshold=4.925e+01, percent-clipped=1.0 2024-08-17 21:53:38,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3536020.0, ans=0.1 2024-08-17 21:53:52,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3536120.0, ans=0.0 2024-08-17 21:54:05,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3536220.0, ans=0.0 2024-08-17 21:54:10,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3536320.0, ans=0.125 2024-08-17 21:54:16,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3536320.0, ans=0.125 2024-08-17 21:54:24,183 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10500, loss[loss=0.1034, beats_loss=0.01146, ecapa_loss=0.0001482, whisper_loss=0.09041, over 16096.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001472, whisper_loss=0.09104, over 3902074.64 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:54:24,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3536420.0, ans=0.125 2024-08-17 21:54:31,323 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 21:54:59,936 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 21:55:09,051 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 21:55:29,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10550, loss[loss=0.1212, beats_loss=0.01007, ecapa_loss=0.0001206, whisper_loss=0.1099, over 17743.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001479, whisper_loss=0.09109, over 3889232.78 frames. ], batch size: 66, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:55:33,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.376e+01 2.745e+01 3.046e+01 5.243e+01, threshold=5.490e+01, percent-clipped=1.0 2024-08-17 21:55:42,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3537020.0, ans=0.125 2024-08-17 21:56:04,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3537120.0, ans=0.05 2024-08-17 21:56:06,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3537220.0, ans=0.125 2024-08-17 21:56:14,242 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 21:56:20,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3537320.0, ans=0.125 2024-08-17 21:56:30,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=10.0 2024-08-17 21:56:33,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10600, loss[loss=0.1056, beats_loss=0.01103, ecapa_loss=0.0001083, whisper_loss=0.09348, over 22988.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01047, ecapa_loss=0.0001478, whisper_loss=0.0908, over 3875198.34 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:56:37,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3537420.0, ans=0.2 2024-08-17 21:56:37,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3537420.0, ans=0.0 2024-08-17 21:56:39,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.51 vs. limit=22.5 2024-08-17 21:56:43,995 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-17 21:56:58,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3537620.0, ans=0.125 2024-08-17 21:57:11,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3537720.0, ans=0.125 2024-08-17 21:57:21,926 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-17 21:57:37,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10650, loss[loss=0.1131, beats_loss=0.01007, ecapa_loss=0.0001312, whisper_loss=0.1017, over 23534.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001476, whisper_loss=0.09077, over 3890021.45 frames. ], batch size: 92, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:57:37,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3537920.0, ans=10.0 2024-08-17 21:57:40,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3537920.0, ans=0.2 2024-08-17 21:57:40,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.432e+01 2.682e+01 2.988e+01 4.178e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-17 21:57:45,205 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 21:57:55,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=3538020.0, ans=15.0 2024-08-17 21:57:56,895 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 21:58:09,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3538120.0, ans=0.0 2024-08-17 21:58:10,894 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 21:58:11,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3538120.0, ans=10.0 2024-08-17 21:58:13,257 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-17 21:58:13,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3538120.0, ans=0.1 2024-08-17 21:58:25,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3538220.0, ans=0.0 2024-08-17 21:58:41,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10700, loss[loss=0.1005, beats_loss=0.01204, ecapa_loss=0.0001094, whisper_loss=0.08742, over 20039.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001471, whisper_loss=0.09122, over 3902550.84 frames. ], batch size: 77, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:58:41,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3538420.0, ans=0.125 2024-08-17 21:58:49,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3538420.0, ans=0.05 2024-08-17 21:58:54,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3538520.0, ans=0.2 2024-08-17 21:59:22,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3538720.0, ans=0.125 2024-08-17 21:59:26,953 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 21:59:28,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3538720.0, ans=0.0 2024-08-17 21:59:31,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3538820.0, ans=0.0 2024-08-17 21:59:32,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3538820.0, ans=0.125 2024-08-17 21:59:36,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-17 21:59:44,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10750, loss[loss=0.09265, beats_loss=0.01271, ecapa_loss=0.0001239, whisper_loss=0.07871, over 17675.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001469, whisper_loss=0.09069, over 3876965.18 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 21:59:48,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.354e+01 2.532e+01 2.828e+01 4.238e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-17 21:59:48,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3538920.0, ans=0.09899494936611666 2024-08-17 22:00:00,529 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-17 22:00:02,997 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-17 22:00:21,666 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 35 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 22:00:34,615 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-17 22:00:41,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3539320.0, ans=0.5 2024-08-17 22:00:45,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-17 22:00:47,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10800, loss[loss=0.1023, beats_loss=0.01105, ecapa_loss=0.0001511, whisper_loss=0.08978, over 22112.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001454, whisper_loss=0.09095, over 3900503.18 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:00:52,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3539420.0, ans=0.125 2024-08-17 22:00:53,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3539420.0, ans=0.125 2024-08-17 22:00:54,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.0 2024-08-17 22:01:02,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3539520.0, ans=0.0 2024-08-17 22:01:06,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2024-08-17 22:01:11,627 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-17 22:01:12,448 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.44 vs. limit=15.0 2024-08-17 22:01:12,908 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-17 22:01:24,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3539720.0, ans=0.125 2024-08-17 22:01:31,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-17 22:01:33,150 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:01:34,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-08-17 22:01:36,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3539820.0, ans=0.0 2024-08-17 22:01:39,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3539820.0, ans=0.125 2024-08-17 22:01:42,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3539820.0, ans=0.125 2024-08-17 22:01:47,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3539820.0, ans=0.125 2024-08-17 22:01:50,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10850, loss[loss=0.09511, beats_loss=0.01091, ecapa_loss=0.0001476, whisper_loss=0.08272, over 21242.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01054, ecapa_loss=0.0001464, whisper_loss=0.09151, over 3919409.23 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:01:54,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.336e+01 2.508e+01 2.767e+01 4.451e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-17 22:01:57,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3539920.0, ans=0.125 2024-08-17 22:01:58,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3539920.0, ans=0.125 2024-08-17 22:02:12,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3540020.0, ans=0.0 2024-08-17 22:02:24,983 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-17 22:02:31,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3540220.0, ans=0.125 2024-08-17 22:02:33,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3540220.0, ans=0.125 2024-08-17 22:02:34,983 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-17 22:02:49,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3540320.0, ans=0.125 2024-08-17 22:02:54,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10900, loss[loss=0.1054, beats_loss=0.01089, ecapa_loss=0.0001318, whisper_loss=0.09317, over 16334.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001459, whisper_loss=0.09141, over 3919759.65 frames. ], batch size: 67, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:03:07,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3540520.0, ans=0.125 2024-08-17 22:03:12,388 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-17 22:03:16,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3540520.0, ans=0.125 2024-08-17 22:03:31,334 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-17 22:03:48,797 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-17 22:03:57,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 10950, loss[loss=0.1071, beats_loss=0.009672, ecapa_loss=0.0001465, whisper_loss=0.09596, over 21000.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09169, over 3919375.56 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:03:59,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3540920.0, ans=0.125 2024-08-17 22:04:01,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.818e+01 2.429e+01 2.667e+01 3.020e+01 4.482e+01, threshold=5.334e+01, percent-clipped=0.0 2024-08-17 22:04:02,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3540920.0, ans=0.0 2024-08-17 22:04:06,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3540920.0, ans=10.0 2024-08-17 22:04:15,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3541020.0, ans=0.125 2024-08-17 22:04:16,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3541020.0, ans=0.125 2024-08-17 22:04:19,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.04 vs. limit=22.5 2024-08-17 22:04:26,147 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-17 22:04:27,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3541120.0, ans=0.2 2024-08-17 22:04:32,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3541120.0, ans=0.125 2024-08-17 22:04:44,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-17 22:04:48,846 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-17 22:05:00,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11000, loss[loss=0.1149, beats_loss=0.01103, ecapa_loss=0.0001394, whisper_loss=0.1025, over 21424.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001469, whisper_loss=0.09162, over 3891135.71 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:05:35,663 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-17 22:05:45,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3541720.0, ans=0.2 2024-08-17 22:05:49,512 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-17 22:05:50,801 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-17 22:06:02,606 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11050, loss[loss=0.1131, beats_loss=0.009868, ecapa_loss=0.0001678, whisper_loss=0.1016, over 23208.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01052, ecapa_loss=0.0001469, whisper_loss=0.09145, over 3911227.21 frames. ], batch size: 94, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:06:03,908 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-17 22:06:06,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.361e+01 2.552e+01 2.844e+01 4.106e+02, threshold=5.103e+01, percent-clipped=1.0 2024-08-17 22:06:07,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2024-08-17 22:06:14,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3542020.0, ans=0.125 2024-08-17 22:06:29,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-17 22:06:36,300 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:06:38,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3542120.0, ans=0.125 2024-08-17 22:06:44,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3542220.0, ans=0.1 2024-08-17 22:06:47,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542220.0, ans=0.1 2024-08-17 22:07:05,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11100, loss[loss=0.1149, beats_loss=0.008715, ecapa_loss=0.0001375, whisper_loss=0.1048, over 22430.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0105, ecapa_loss=0.0001472, whisper_loss=0.0912, over 3896614.23 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:07:06,793 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 22:07:15,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3542420.0, ans=0.05 2024-08-17 22:07:16,810 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-17 22:07:46,157 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 28 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-17 22:07:59,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3542820.0, ans=0.125 2024-08-17 22:08:05,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3542820.0, ans=0.1 2024-08-17 22:08:05,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3542820.0, ans=0.0 2024-08-17 22:08:08,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11150, loss[loss=0.1249, beats_loss=0.009722, ecapa_loss=0.0001381, whisper_loss=0.1138, over 23515.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.0104, ecapa_loss=0.0001468, whisper_loss=0.09225, over 3919602.16 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:08:11,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.10 vs. limit=15.0 2024-08-17 22:08:12,139 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.728e+01 2.281e+01 2.502e+01 2.861e+01 4.409e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-17 22:08:19,573 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 22:08:23,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3543020.0, ans=0.125 2024-08-17 22:08:32,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3543120.0, ans=0.2 2024-08-17 22:08:36,040 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-17 22:08:46,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3543220.0, ans=0.125 2024-08-17 22:08:50,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3543220.0, ans=0.125 2024-08-17 22:08:56,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=3543220.0, ans=15.0 2024-08-17 22:08:59,643 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 22:09:00,989 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-17 22:09:10,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11200, loss[loss=0.103, beats_loss=0.0103, ecapa_loss=0.0001417, whisper_loss=0.09126, over 22108.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01039, ecapa_loss=0.0001474, whisper_loss=0.09217, over 3919103.43 frames. ], batch size: 88, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:09:15,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.61 vs. limit=22.5 2024-08-17 22:09:25,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3543520.0, ans=0.125 2024-08-17 22:09:35,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3543620.0, ans=0.125 2024-08-17 22:09:44,783 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 24 from Vox, 48 fro AS 2024-08-17 22:09:54,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3543720.0, ans=0.0 2024-08-17 22:10:03,834 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-17 22:10:13,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11250, loss[loss=0.1141, beats_loss=0.009775, ecapa_loss=0.0001563, whisper_loss=0.1027, over 22883.00 frames. ], tot_loss[loss=0.1044, beats_loss=0.0104, ecapa_loss=0.0001484, whisper_loss=0.09256, over 3946461.45 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:10:17,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.330e+01 2.572e+01 2.919e+01 3.914e+02, threshold=5.145e+01, percent-clipped=2.0 2024-08-17 22:10:28,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3544020.0, ans=0.2 2024-08-17 22:10:49,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3544120.0, ans=0.125 2024-08-17 22:10:50,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3544120.0, ans=0.125 2024-08-17 22:10:59,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3544220.0, ans=0.125 2024-08-17 22:11:18,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11300, loss[loss=0.09109, beats_loss=0.01091, ecapa_loss=0.0001378, whisper_loss=0.0788, over 21908.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01039, ecapa_loss=0.0001488, whisper_loss=0.09215, over 3926853.24 frames. ], batch size: 90, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:11:21,438 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 22:11:22,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3544420.0, ans=0.125 2024-08-17 22:11:28,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.26 vs. limit=10.0 2024-08-17 22:11:42,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3544520.0, ans=0.125 2024-08-17 22:11:49,988 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:11:51,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3544620.0, ans=0.125 2024-08-17 22:12:00,756 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-17 22:12:05,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.11 vs. limit=15.0 2024-08-17 22:12:11,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3544820.0, ans=0.0 2024-08-17 22:12:17,065 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 22:12:22,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3544820.0, ans=0.0 2024-08-17 22:12:26,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11350, loss[loss=0.1239, beats_loss=0.007801, ecapa_loss=0.0001448, whisper_loss=0.1147, over 21879.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01042, ecapa_loss=0.0001489, whisper_loss=0.09207, over 3917438.66 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:12:29,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.322e+01 2.583e+01 3.031e+01 6.064e+01, threshold=5.166e+01, percent-clipped=1.0 2024-08-17 22:12:32,514 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-17 22:12:42,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3545020.0, ans=10.0 2024-08-17 22:12:46,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3545020.0, ans=0.1 2024-08-17 22:13:00,904 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 22:13:02,530 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-17 22:13:11,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3545220.0, ans=0.125 2024-08-17 22:13:15,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3545220.0, ans=0.0 2024-08-17 22:13:33,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11400, loss[loss=0.1207, beats_loss=0.009306, ecapa_loss=0.0001803, whisper_loss=0.1096, over 19428.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01043, ecapa_loss=0.000149, whisper_loss=0.09173, over 3868740.64 frames. ], batch size: 80, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:13:57,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=15.0 2024-08-17 22:13:57,826 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-17 22:14:18,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3545720.0, ans=0.125 2024-08-17 22:14:25,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3545720.0, ans=0.1 2024-08-17 22:14:25,817 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.28 vs. limit=10.0 2024-08-17 22:14:28,464 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-17 22:14:40,449 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 22:14:43,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11450, loss[loss=0.1227, beats_loss=0.009351, ecapa_loss=0.0001499, whisper_loss=0.1119, over 20839.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01048, ecapa_loss=0.000148, whisper_loss=0.09134, over 3872734.80 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:14:44,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3545920.0, ans=0.0 2024-08-17 22:14:45,305 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-17 22:14:47,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.375e+01 2.632e+01 2.898e+01 5.397e+01, threshold=5.264e+01, percent-clipped=1.0 2024-08-17 22:14:49,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3545920.0, ans=0.0 2024-08-17 22:14:52,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3545920.0, ans=0.05 2024-08-17 22:15:40,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3546320.0, ans=0.0 2024-08-17 22:15:45,220 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 22:15:52,273 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-17 22:15:56,054 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11500, loss[loss=0.09038, beats_loss=0.013, ecapa_loss=0.0001477, whisper_loss=0.0759, over 21446.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001469, whisper_loss=0.09109, over 3900200.69 frames. ], batch size: 87, lr: 2.54e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:16:00,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3546420.0, ans=0.0 2024-08-17 22:16:09,308 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 23 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-17 22:16:20,396 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 22:16:25,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3546620.0, ans=0.125 2024-08-17 22:16:35,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3546620.0, ans=0.2 2024-08-17 22:16:39,058 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-17 22:16:46,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3546720.0, ans=0.2 2024-08-17 22:16:50,894 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:16:53,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3546720.0, ans=0.0 2024-08-17 22:17:03,262 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-17 22:17:05,003 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-17 22:17:10,424 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-17 22:17:11,784 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 22:17:12,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3546920.0, ans=0.2 2024-08-17 22:17:13,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11550, loss[loss=0.07027, beats_loss=0.01211, ecapa_loss=0.000133, whisper_loss=0.05682, over 13685.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001464, whisper_loss=0.09158, over 3913491.94 frames. ], batch size: 56, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:17:17,595 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.251e+01 2.574e+01 2.799e+01 8.248e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-17 22:17:17,724 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-17 22:17:26,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3547020.0, ans=0.125 2024-08-17 22:17:37,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3547020.0, ans=0.0 2024-08-17 22:17:39,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3547020.0, ans=0.2 2024-08-17 22:18:09,511 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-17 22:18:18,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3547320.0, ans=0.125 2024-08-17 22:18:39,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11600, loss[loss=0.1066, beats_loss=0.01029, ecapa_loss=0.0001593, whisper_loss=0.09473, over 22678.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.0001468, whisper_loss=0.09161, over 3916440.01 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:18:43,273 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-17 22:19:27,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2024-08-17 22:19:41,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3547720.0, ans=0.125 2024-08-17 22:19:47,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=3.89 vs. limit=15.0 2024-08-17 22:20:20,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11650, loss[loss=0.1036, beats_loss=0.008702, ecapa_loss=0.0001621, whisper_loss=0.09326, over 16596.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001477, whisper_loss=0.0909, over 3921180.67 frames. ], batch size: 65, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:20:27,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.333e+01 2.551e+01 2.882e+01 3.740e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-17 22:20:36,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3547920.0, ans=0.1 2024-08-17 22:20:45,492 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 22:20:55,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3548120.0, ans=0.0 2024-08-17 22:21:08,104 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 34 from LS+wenet, 13 from Vox, 17 fro AS 2024-08-17 22:21:10,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3548220.0, ans=0.125 2024-08-17 22:21:16,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3548220.0, ans=0.1 2024-08-17 22:21:20,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2024-08-17 22:21:21,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3548220.0, ans=0.125 2024-08-17 22:21:30,045 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:21:33,882 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 20 from Vox, 52 fro AS 2024-08-17 22:21:40,171 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11700, loss[loss=0.114, beats_loss=0.0075, ecapa_loss=0.0001787, whisper_loss=0.1048, over 14974.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001479, whisper_loss=0.09078, over 3922092.94 frames. ], batch size: 62, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:22:51,298 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-17 22:23:08,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3548720.0, ans=0.09899494936611666 2024-08-17 22:23:11,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3548720.0, ans=0.0 2024-08-17 22:23:13,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3548720.0, ans=0.2 2024-08-17 22:23:28,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2024-08-17 22:23:37,563 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11750, loss[loss=0.09221, beats_loss=0.01245, ecapa_loss=0.0001673, whisper_loss=0.07809, over 18824.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001479, whisper_loss=0.09109, over 3930491.49 frames. ], batch size: 81, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:23:42,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.390e+01 2.565e+01 2.987e+01 4.892e+01, threshold=5.130e+01, percent-clipped=0.0 2024-08-17 22:24:01,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3549020.0, ans=0.125 2024-08-17 22:24:16,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3549120.0, ans=0.2 2024-08-17 22:24:16,694 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.01 vs. limit=22.5 2024-08-17 22:24:27,439 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-17 22:24:32,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-08-17 22:24:39,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3549220.0, ans=0.125 2024-08-17 22:24:40,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-08-17 22:24:48,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3549220.0, ans=0.125 2024-08-17 22:24:52,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3549320.0, ans=0.0 2024-08-17 22:25:00,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3549320.0, ans=0.125 2024-08-17 22:25:11,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11800, loss[loss=0.1065, beats_loss=0.01051, ecapa_loss=0.0001543, whisper_loss=0.09443, over 15009.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01064, ecapa_loss=0.0001479, whisper_loss=0.09128, over 3933180.43 frames. ], batch size: 58, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:25:32,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3549520.0, ans=0.0 2024-08-17 22:25:44,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3549520.0, ans=0.125 2024-08-17 22:25:47,778 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-17 22:25:55,870 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-17 22:26:01,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3549620.0, ans=0.0 2024-08-17 22:26:05,056 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 22:26:26,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3549720.0, ans=0.0 2024-08-17 22:26:38,555 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 22:26:55,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11850, loss[loss=0.1053, beats_loss=0.008643, ecapa_loss=0.0001362, whisper_loss=0.09533, over 18494.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01064, ecapa_loss=0.0001481, whisper_loss=0.09088, over 3917798.76 frames. ], batch size: 72, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:26:55,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3549920.0, ans=0.125 2024-08-17 22:27:02,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.313e+01 2.497e+01 2.701e+01 4.196e+01, threshold=4.994e+01, percent-clipped=0.0 2024-08-17 22:27:11,829 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:27:42,225 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-17 22:27:42,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3550020.0, ans=0.0 2024-08-17 22:28:28,311 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-17 22:28:53,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11900, loss[loss=0.1102, beats_loss=0.0102, ecapa_loss=0.0001466, whisper_loss=0.09856, over 21518.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01062, ecapa_loss=0.0001487, whisper_loss=0.09155, over 3911397.42 frames. ], batch size: 86, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:28:54,109 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-17 22:29:18,943 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-17 22:29:52,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3550620.0, ans=0.2 2024-08-17 22:29:58,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3550620.0, ans=0.125 2024-08-17 22:30:17,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.99 vs. limit=15.0 2024-08-17 22:30:32,378 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-17 22:30:46,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 11950, loss[loss=0.1039, beats_loss=0.009616, ecapa_loss=0.0001465, whisper_loss=0.09279, over 20918.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01058, ecapa_loss=0.0001489, whisper_loss=0.09142, over 3898664.55 frames. ], batch size: 80, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:30:49,843 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-17 22:30:53,069 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.173e+01 2.418e+01 2.712e+01 4.261e+01, threshold=4.835e+01, percent-clipped=0.0 2024-08-17 22:31:19,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-08-17 22:32:19,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12000, loss[loss=0.1122, beats_loss=0.00986, ecapa_loss=0.0001825, whisper_loss=0.1005, over 17992.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01054, ecapa_loss=0.0001477, whisper_loss=0.0914, over 3875617.79 frames. ], batch size: 73, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:32:19,373 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 22:33:02,571 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005236, whisper_loss=0.2457, over 922467.00 frames. 2024-08-17 22:33:16,238 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on SV_voxceleb1: loss=0.004219, beats_loss=0, ecapa_loss=0.0004219, whisper_loss=0, over 939242.00 frames. 2024-08-17 22:33:36,735 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7679, 2.2084, 2.3929, 2.3524], device='cuda:3') 2024-08-17 22:35:20,233 INFO [train_multi_KD3.py:1149] (3/4) Epoch 24, validation on AT_audioset: loss=0.02322, beats_loss=0.02322, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 22:35:20,237 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 22:35:44,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3551520.0, ans=0.125 2024-08-17 22:35:46,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3551520.0, ans=0.125 2024-08-17 22:35:48,003 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-17 22:36:12,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3551720.0, ans=0.125 2024-08-17 22:36:19,568 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-17 22:36:37,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12050, loss[loss=0.07921, beats_loss=0.01152, ecapa_loss=0.0001408, whisper_loss=0.06627, over 22222.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001477, whisper_loss=0.09117, over 3876225.82 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:36:37,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3551920.0, ans=0.0 2024-08-17 22:36:41,540 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.293e+01 2.540e+01 2.892e+01 1.917e+02, threshold=5.080e+01, percent-clipped=1.0 2024-08-17 22:36:43,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3551920.0, ans=0.125 2024-08-17 22:37:48,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3552320.0, ans=0.1 2024-08-17 22:37:50,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3552320.0, ans=0.0 2024-08-17 22:37:51,764 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-17 22:37:53,185 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 22:37:54,084 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12100, loss[loss=0.1127, beats_loss=0.01119, ecapa_loss=0.0001725, whisper_loss=0.09975, over 21849.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001483, whisper_loss=0.09137, over 3894127.32 frames. ], batch size: 91, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:38:04,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3552420.0, ans=0.125 2024-08-17 22:38:09,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3552520.0, ans=0.2 2024-08-17 22:38:11,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3552520.0, ans=0.05 2024-08-17 22:38:27,706 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 15 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-17 22:38:36,152 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-17 22:38:48,430 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 22:39:10,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12150, loss[loss=0.1274, beats_loss=0.009442, ecapa_loss=0.0001255, whisper_loss=0.1167, over 23852.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01032, ecapa_loss=0.0001483, whisper_loss=0.09163, over 3872779.00 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:39:14,795 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.279e+01 2.475e+01 2.710e+01 6.792e+01, threshold=4.950e+01, percent-clipped=1.0 2024-08-17 22:39:16,225 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-17 22:40:10,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3553320.0, ans=0.1 2024-08-17 22:40:14,642 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-17 22:40:18,895 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-17 22:40:22,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3553420.0, ans=10.0 2024-08-17 22:40:22,889 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12200, loss[loss=0.08933, beats_loss=0.01329, ecapa_loss=0.0001049, whisper_loss=0.07499, over 18149.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01038, ecapa_loss=0.0001477, whisper_loss=0.09171, over 3888267.71 frames. ], batch size: 71, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:40:26,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3553420.0, ans=0.0 2024-08-17 22:40:58,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3553620.0, ans=0.0 2024-08-17 22:41:09,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3553720.0, ans=0.0 2024-08-17 22:41:33,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3553820.0, ans=0.125 2024-08-17 22:41:35,244 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12250, loss[loss=0.0972, beats_loss=0.01238, ecapa_loss=0.0001029, whisper_loss=0.08379, over 20438.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0104, ecapa_loss=0.000147, whisper_loss=0.09148, over 3850319.29 frames. ], batch size: 78, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:41:39,673 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.433e+01 2.663e+01 3.002e+01 4.108e+01, threshold=5.326e+01, percent-clipped=0.0 2024-08-17 22:41:55,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3554020.0, ans=0.2 2024-08-17 22:42:03,759 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-17 22:42:14,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3554120.0, ans=0.0 2024-08-17 22:42:16,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3554120.0, ans=0.04949747468305833 2024-08-17 22:42:31,384 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-17 22:42:38,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3554320.0, ans=0.1 2024-08-17 22:42:47,856 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12300, loss[loss=0.1303, beats_loss=0.008042, ecapa_loss=0.0001666, whisper_loss=0.1206, over 13884.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001466, whisper_loss=0.09068, over 3817744.29 frames. ], batch size: 53, lr: 2.54e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:43:04,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3554520.0, ans=0.0 2024-08-17 22:43:06,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3554520.0, ans=0.2 2024-08-17 22:43:16,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3554620.0, ans=0.0 2024-08-17 22:43:16,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3554620.0, ans=0.05 2024-08-17 22:43:28,639 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-17 22:43:39,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3554720.0, ans=0.125 2024-08-17 22:44:00,841 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12350, loss[loss=0.1203, beats_loss=0.00844, ecapa_loss=0.0001785, whisper_loss=0.11, over 22223.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001476, whisper_loss=0.09064, over 3837275.68 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:44:05,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.318e+01 2.520e+01 2.807e+01 5.445e+01, threshold=5.040e+01, percent-clipped=1.0 2024-08-17 22:44:28,194 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-17 22:44:31,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555120.0, ans=0.1 2024-08-17 22:44:52,134 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 22:45:04,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3555320.0, ans=0.2 2024-08-17 22:45:13,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12400, loss[loss=0.1179, beats_loss=0.00922, ecapa_loss=0.0001211, whisper_loss=0.1075, over 17351.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001471, whisper_loss=0.09121, over 3844505.24 frames. ], batch size: 64, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:45:22,791 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 22:45:25,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-17 22:45:40,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3555620.0, ans=0.1 2024-08-17 22:45:41,580 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-17 22:45:49,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3555620.0, ans=0.2 2024-08-17 22:46:22,174 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 22:46:22,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-08-17 22:46:23,491 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12450, loss[loss=0.1213, beats_loss=0.0095, ecapa_loss=0.0001661, whisper_loss=0.1101, over 18217.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001476, whisper_loss=0.09072, over 3866693.75 frames. ], batch size: 71, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:46:27,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.268e+01 2.505e+01 2.895e+01 6.006e+01, threshold=5.010e+01, percent-clipped=2.0 2024-08-17 22:46:34,939 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-17 22:46:44,199 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-17 22:46:46,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3556020.0, ans=0.0 2024-08-17 22:46:58,019 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-17 22:47:16,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3556220.0, ans=0.0 2024-08-17 22:47:33,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12500, loss[loss=0.09678, beats_loss=0.01105, ecapa_loss=0.00015, whisper_loss=0.08423, over 20940.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0105, ecapa_loss=0.0001471, whisper_loss=0.09045, over 3867416.39 frames. ], batch size: 87, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:47:40,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3556420.0, ans=0.125 2024-08-17 22:47:53,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3556520.0, ans=0.125 2024-08-17 22:47:53,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3556520.0, ans=0.09899494936611666 2024-08-17 22:47:59,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.05 vs. limit=10.0 2024-08-17 22:48:02,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2024-08-17 22:48:09,634 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 39 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-17 22:48:13,923 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-17 22:48:25,573 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 22:48:41,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3556920.0, ans=0.2 2024-08-17 22:48:41,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12550, loss[loss=0.1035, beats_loss=0.01022, ecapa_loss=0.0001253, whisper_loss=0.09199, over 14243.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.000147, whisper_loss=0.09039, over 3884592.91 frames. ], batch size: 54, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:48:42,011 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 22:48:45,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3556920.0, ans=0.035 2024-08-17 22:48:46,293 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.984e+01 2.357e+01 2.585e+01 2.988e+01 4.779e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-17 22:48:51,950 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-17 22:48:57,490 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-17 22:49:03,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3557020.0, ans=0.2 2024-08-17 22:49:16,813 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-17 22:49:20,021 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 22:49:27,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3557220.0, ans=0.0 2024-08-17 22:49:29,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3557220.0, ans=0.125 2024-08-17 22:49:51,220 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12600, loss[loss=0.07086, beats_loss=0.01255, ecapa_loss=0.0001656, whisper_loss=0.05666, over 14485.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001468, whisper_loss=0.09071, over 3900836.40 frames. ], batch size: 59, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:49:58,324 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-17 22:50:32,772 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-17 22:50:39,717 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-17 22:51:00,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12650, loss[loss=0.1265, beats_loss=0.01043, ecapa_loss=0.0001273, whisper_loss=0.1148, over 17641.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001465, whisper_loss=0.09042, over 3862243.65 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:51:04,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.293e+01 2.533e+01 2.794e+01 5.900e+01, threshold=5.066e+01, percent-clipped=1.0 2024-08-17 22:51:21,674 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 22:51:25,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3558020.0, ans=15.0 2024-08-17 22:51:35,673 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-17 22:51:46,196 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-17 22:51:46,742 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:51:48,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2024-08-17 22:52:08,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12700, loss[loss=0.1304, beats_loss=0.00967, ecapa_loss=0.0001517, whisper_loss=0.1193, over 23609.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001461, whisper_loss=0.09013, over 3864767.58 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:52:11,180 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-17 22:52:31,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3558520.0, ans=0.0 2024-08-17 22:52:35,575 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.031e-02 2024-08-17 22:52:46,866 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 22:52:47,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3558620.0, ans=0.125 2024-08-17 22:52:49,686 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-17 22:52:51,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.11 vs. limit=10.0 2024-08-17 22:52:52,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3558720.0, ans=0.125 2024-08-17 22:53:17,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.76 vs. limit=22.5 2024-08-17 22:53:18,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12750, loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001882, whisper_loss=0.09123, over 22184.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01068, ecapa_loss=0.0001471, whisper_loss=0.091, over 3925926.85 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:53:18,602 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-17 22:53:21,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3558920.0, ans=0.0 2024-08-17 22:53:21,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=12.0 2024-08-17 22:53:22,184 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.283e+01 2.578e+01 2.885e+01 4.284e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-17 22:53:34,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3559020.0, ans=0.95 2024-08-17 22:53:54,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3559120.0, ans=0.0 2024-08-17 22:53:55,690 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-17 22:54:00,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2024-08-17 22:54:18,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3559320.0, ans=0.0 2024-08-17 22:54:26,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12800, loss[loss=0.111, beats_loss=0.01032, ecapa_loss=0.000175, whisper_loss=0.0989, over 20015.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01069, ecapa_loss=0.0001479, whisper_loss=0.09087, over 3911181.25 frames. ], batch size: 79, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:54:34,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2024-08-17 22:54:38,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3559420.0, ans=0.125 2024-08-17 22:54:46,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3559520.0, ans=0.2 2024-08-17 22:54:49,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3559520.0, ans=0.125 2024-08-17 22:54:59,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3559620.0, ans=0.2 2024-08-17 22:55:10,713 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:55:26,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.25 vs. limit=15.0 2024-08-17 22:55:31,977 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 22:55:33,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3559820.0, ans=0.125 2024-08-17 22:55:33,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3559820.0, ans=0.0 2024-08-17 22:55:37,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12850, loss[loss=0.1047, beats_loss=0.0106, ecapa_loss=0.0001446, whisper_loss=0.09265, over 22433.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01064, ecapa_loss=0.0001476, whisper_loss=0.09178, over 3909444.05 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:55:37,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3559920.0, ans=0.125 2024-08-17 22:55:40,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3559920.0, ans=0.125 2024-08-17 22:55:41,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.263e+01 2.521e+01 2.838e+01 3.742e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-17 22:56:08,939 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 27 from Vox, 24 fro AS 2024-08-17 22:56:21,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3560220.0, ans=0.125 2024-08-17 22:56:24,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2024-08-17 22:56:34,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.40 vs. limit=15.0 2024-08-17 22:56:44,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3560320.0, ans=0.125 2024-08-17 22:56:47,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3560320.0, ans=0.0 2024-08-17 22:56:49,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12900, loss[loss=0.09539, beats_loss=0.01025, ecapa_loss=0.0001373, whisper_loss=0.08377, over 19894.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01074, ecapa_loss=0.0001464, whisper_loss=0.08977, over 3894925.71 frames. ], batch size: 81, lr: 2.53e-03, grad_scale: 1.152921504606847e+18 2024-08-17 22:56:56,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3560420.0, ans=0.125 2024-08-17 22:57:06,581 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-17 22:57:12,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-17 22:57:12,031 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.72 vs. limit=15.0 2024-08-17 22:57:18,250 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-17 22:57:27,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3560620.0, ans=0.0 2024-08-17 22:57:31,331 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-17 22:57:39,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3560720.0, ans=0.125 2024-08-17 22:57:40,987 WARNING [optim.py:496] (3/4) Scaling gradients by 0.045217473059892654, model_norm_threshold=50.41379928588867 2024-08-17 22:57:41,158 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.812e+05, grad_sumsq=1.812e+05, orig_rms_sq=1.000e+00 2024-08-17 22:57:43,977 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-17 22:57:54,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3560820.0, ans=0.125 2024-08-17 22:58:02,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 12950, loss[loss=0.083, beats_loss=0.006661, ecapa_loss=0.0001879, whisper_loss=0.07446, over 15436.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001472, whisper_loss=0.08987, over 3857790.94 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:58:07,898 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.175e+01 2.399e+01 2.908e+01 1.115e+03, threshold=4.798e+01, percent-clipped=1.0 2024-08-17 22:58:32,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3561120.0, ans=0.1 2024-08-17 22:58:46,565 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-17 22:58:46,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3561220.0, ans=0.125 2024-08-17 22:58:52,580 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 22:58:55,350 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-17 22:59:11,490 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-17 22:59:11,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3561320.0, ans=0.05 2024-08-17 22:59:15,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13000, loss[loss=0.1118, beats_loss=0.01053, ecapa_loss=0.0001581, whisper_loss=0.09968, over 22787.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001483, whisper_loss=0.09055, over 3902011.77 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 22:59:29,929 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-17 22:59:34,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3561520.0, ans=0.07 2024-08-17 22:59:45,348 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 22:59:48,030 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-17 23:00:00,283 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-17 23:00:00,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2024-08-17 23:00:03,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3561720.0, ans=0.125 2024-08-17 23:00:21,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3561820.0, ans=0.0 2024-08-17 23:00:29,002 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13050, loss[loss=0.1058, beats_loss=0.01121, ecapa_loss=0.0001414, whisper_loss=0.09317, over 22887.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001479, whisper_loss=0.09133, over 3905063.80 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:00:34,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.290e+01 2.565e+01 2.843e+01 4.740e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-17 23:01:01,962 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08320149034261703, model_norm_threshold=51.3093376159668 2024-08-17 23:01:02,131 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.121e+04, grad_sumsq=1.818e+04, orig_rms_sq=3.366e+00 2024-08-17 23:01:02,526 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 23:01:06,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2024-08-17 23:01:07,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3562120.0, ans=0.0 2024-08-17 23:01:13,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3562220.0, ans=0.2 2024-08-17 23:01:31,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3562320.0, ans=0.1 2024-08-17 23:01:34,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3562320.0, ans=0.025 2024-08-17 23:01:44,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13100, loss[loss=0.08757, beats_loss=0.0115, ecapa_loss=0.0001776, whisper_loss=0.07429, over 21222.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.000148, whisper_loss=0.09106, over 3887051.25 frames. ], batch size: 93, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:02:14,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.78 vs. limit=15.0 2024-08-17 23:02:39,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3562720.0, ans=0.125 2024-08-17 23:03:00,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13150, loss[loss=0.1262, beats_loss=0.008504, ecapa_loss=0.0001484, whisper_loss=0.1162, over 18391.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001478, whisper_loss=0.09113, over 3889712.67 frames. ], batch size: 74, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:03:06,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.009e+01 2.414e+01 2.698e+01 3.144e+01 6.167e+02, threshold=5.396e+01, percent-clipped=2.0 2024-08-17 23:03:17,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3563020.0, ans=0.04949747468305833 2024-08-17 23:03:20,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3563020.0, ans=0.125 2024-08-17 23:03:35,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.99 vs. limit=6.0 2024-08-17 23:03:39,104 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-17 23:04:07,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-17 23:04:10,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3563320.0, ans=0.2 2024-08-17 23:04:15,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13200, loss[loss=0.0791, beats_loss=0.01327, ecapa_loss=0.000114, whisper_loss=0.0647, over 18231.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.000148, whisper_loss=0.0909, over 3844416.92 frames. ], batch size: 73, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:04:26,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3563420.0, ans=0.07 2024-08-17 23:05:17,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3563820.0, ans=0.125 2024-08-17 23:05:18,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3563820.0, ans=0.125 2024-08-17 23:05:28,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13250, loss[loss=0.09196, beats_loss=0.01005, ecapa_loss=0.0001697, whisper_loss=0.08021, over 21739.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001488, whisper_loss=0.09082, over 3851496.17 frames. ], batch size: 92, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:05:34,393 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.345e+01 2.575e+01 2.975e+01 4.743e+02, threshold=5.149e+01, percent-clipped=2.0 2024-08-17 23:05:51,576 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-17 23:05:56,545 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.58 vs. limit=22.5 2024-08-17 23:05:59,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3564120.0, ans=0.125 2024-08-17 23:06:02,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3564120.0, ans=0.0 2024-08-17 23:06:20,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3564220.0, ans=0.5 2024-08-17 23:06:21,881 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-17 23:06:31,361 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-17 23:06:34,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3564320.0, ans=0.07 2024-08-17 23:06:39,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13300, loss[loss=0.1116, beats_loss=0.009689, ecapa_loss=0.0001182, whisper_loss=0.1007, over 15966.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001474, whisper_loss=0.09069, over 3840863.11 frames. ], batch size: 57, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:06:39,543 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 23:06:43,777 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-17 23:06:48,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3564420.0, ans=0.125 2024-08-17 23:07:06,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3564620.0, ans=0.0 2024-08-17 23:07:07,241 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-17 23:07:08,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.14 vs. limit=15.0 2024-08-17 23:07:09,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.01 vs. limit=12.0 2024-08-17 23:07:26,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3564720.0, ans=0.125 2024-08-17 23:07:26,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2024-08-17 23:07:27,154 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-17 23:07:36,552 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-17 23:07:39,173 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-17 23:07:48,108 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13350, loss[loss=0.1288, beats_loss=0.007672, ecapa_loss=0.0001601, whisper_loss=0.1195, over 15711.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001477, whisper_loss=0.09037, over 3836951.90 frames. ], batch size: 61, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:07:50,757 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-17 23:07:53,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.397e+01 2.708e+01 2.975e+01 4.671e+01, threshold=5.415e+01, percent-clipped=0.0 2024-08-17 23:07:53,805 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 22 from LS+wenet, 15 from Vox, 16 fro AS 2024-08-17 23:08:06,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3565020.0, ans=0.125 2024-08-17 23:08:24,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3565120.0, ans=0.125 2024-08-17 23:08:24,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.60 vs. limit=12.0 2024-08-17 23:08:28,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3565220.0, ans=0.1 2024-08-17 23:08:33,564 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-17 23:08:42,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3565320.0, ans=0.2 2024-08-17 23:08:49,800 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 30 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-17 23:08:55,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13400, loss[loss=0.08705, beats_loss=0.01022, ecapa_loss=0.0001854, whisper_loss=0.07497, over 16339.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001475, whisper_loss=0.09115, over 3829193.63 frames. ], batch size: 66, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:08:57,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3565420.0, ans=0.125 2024-08-17 23:09:19,095 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-17 23:09:22,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3565620.0, ans=0.2 2024-08-17 23:09:26,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3565620.0, ans=0.0 2024-08-17 23:09:40,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3565720.0, ans=0.125 2024-08-17 23:09:50,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3565820.0, ans=0.07 2024-08-17 23:09:57,199 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-17 23:09:59,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3565820.0, ans=0.0 2024-08-17 23:10:05,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13450, loss[loss=0.1059, beats_loss=0.01017, ecapa_loss=0.0001433, whisper_loss=0.09435, over 19880.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01044, ecapa_loss=0.0001475, whisper_loss=0.09171, over 3871325.88 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:10:11,371 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.425e+01 2.663e+01 2.956e+01 3.669e+02, threshold=5.327e+01, percent-clipped=2.0 2024-08-17 23:10:21,251 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-17 23:10:25,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.21 vs. limit=15.0 2024-08-17 23:10:43,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566120.0, ans=0.1 2024-08-17 23:10:52,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3566220.0, ans=0.2 2024-08-17 23:11:14,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13500, loss[loss=0.09033, beats_loss=0.009618, ecapa_loss=0.0001815, whisper_loss=0.0789, over 16056.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.000148, whisper_loss=0.09115, over 3891421.21 frames. ], batch size: 65, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:11:20,633 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-17 23:11:23,143 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-17 23:11:27,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3566520.0, ans=0.125 2024-08-17 23:11:38,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3566520.0, ans=0.1 2024-08-17 23:11:43,907 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-17 23:11:46,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3566620.0, ans=0.2 2024-08-17 23:11:52,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3566620.0, ans=0.125 2024-08-17 23:11:52,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3566620.0, ans=0.04949747468305833 2024-08-17 23:11:53,526 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 23:12:01,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3566720.0, ans=0.125 2024-08-17 23:12:06,525 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-17 23:12:08,897 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-17 23:12:21,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13550, loss[loss=0.1115, beats_loss=0.00947, ecapa_loss=0.0001729, whisper_loss=0.1003, over 19598.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01047, ecapa_loss=0.000149, whisper_loss=0.09153, over 3879411.08 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:12:26,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.373e+01 2.638e+01 2.818e+01 4.102e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-17 23:12:26,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3566920.0, ans=0.125 2024-08-17 23:12:30,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3566920.0, ans=0.2 2024-08-17 23:12:44,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3567020.0, ans=0.125 2024-08-17 23:12:49,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3567120.0, ans=0.0 2024-08-17 23:12:50,875 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-17 23:12:54,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.76 vs. limit=22.5 2024-08-17 23:12:55,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3567120.0, ans=0.125 2024-08-17 23:13:05,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3567220.0, ans=0.125 2024-08-17 23:13:07,315 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2024-08-17 23:13:08,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3567220.0, ans=0.2 2024-08-17 23:13:17,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3567320.0, ans=0.125 2024-08-17 23:13:24,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3567320.0, ans=0.125 2024-08-17 23:13:28,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3567420.0, ans=0.125 2024-08-17 23:13:29,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13600, loss[loss=0.08338, beats_loss=0.01062, ecapa_loss=0.0001622, whisper_loss=0.07114, over 20620.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001483, whisper_loss=0.09145, over 3901743.45 frames. ], batch size: 89, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:13:31,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-17 23:13:32,473 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-17 23:13:43,914 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-17 23:13:54,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3567520.0, ans=0.0 2024-08-17 23:14:24,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2024-08-17 23:14:28,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3567820.0, ans=0.2 2024-08-17 23:14:33,742 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 23:14:40,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13650, loss[loss=0.1137, beats_loss=0.00968, ecapa_loss=0.0001254, whisper_loss=0.1028, over 22069.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001481, whisper_loss=0.09142, over 3894816.62 frames. ], batch size: 84, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:14:43,249 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-17 23:14:47,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.376e+01 2.690e+01 3.104e+01 4.136e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-17 23:14:47,544 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 23:14:54,666 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-17 23:15:04,748 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2024-08-17 23:15:05,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.98 vs. limit=12.0 2024-08-17 23:15:13,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3568120.0, ans=0.0 2024-08-17 23:15:38,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3568320.0, ans=0.125 2024-08-17 23:15:40,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3568320.0, ans=0.1 2024-08-17 23:15:51,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3568320.0, ans=0.035 2024-08-17 23:15:53,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13700, loss[loss=0.1043, beats_loss=0.01071, ecapa_loss=0.0001573, whisper_loss=0.09199, over 18039.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01059, ecapa_loss=0.0001468, whisper_loss=0.09133, over 3905293.79 frames. ], batch size: 73, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:15:58,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3568420.0, ans=0.125 2024-08-17 23:16:12,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=3568520.0, ans=10.0 2024-08-17 23:16:18,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-08-17 23:16:22,981 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-17 23:16:26,159 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-17 23:16:49,964 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-17 23:16:54,710 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-17 23:17:03,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3568920.0, ans=0.0 2024-08-17 23:17:04,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13750, loss[loss=0.09692, beats_loss=0.009385, ecapa_loss=0.0001553, whisper_loss=0.08598, over 20507.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001458, whisper_loss=0.09137, over 3877411.85 frames. ], batch size: 84, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:17:09,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=12.0 2024-08-17 23:17:10,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.371e+01 2.635e+01 2.890e+01 4.017e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-17 23:17:13,746 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-17 23:17:17,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3569020.0, ans=0.125 2024-08-17 23:17:29,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3569020.0, ans=0.0 2024-08-17 23:17:55,930 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-17 23:17:56,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3569220.0, ans=0.0 2024-08-17 23:17:58,433 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-17 23:18:11,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3569320.0, ans=0.07 2024-08-17 23:18:13,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13800, loss[loss=0.0987, beats_loss=0.01109, ecapa_loss=0.0001493, whisper_loss=0.08612, over 18439.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01056, ecapa_loss=0.0001462, whisper_loss=0.09135, over 3907599.69 frames. ], batch size: 73, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:18:21,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3569420.0, ans=0.0 2024-08-17 23:18:26,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3569520.0, ans=0.09899494936611666 2024-08-17 23:18:34,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3569520.0, ans=0.0 2024-08-17 23:18:44,616 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 23:18:48,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-17 23:19:17,931 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-17 23:19:21,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13850, loss[loss=0.1072, beats_loss=0.00771, ecapa_loss=0.0001424, whisper_loss=0.09811, over 17726.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01059, ecapa_loss=0.0001458, whisper_loss=0.09058, over 3870312.88 frames. ], batch size: 67, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:19:26,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3569920.0, ans=0.0 2024-08-17 23:19:26,981 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.402e+01 2.666e+01 2.960e+01 4.114e+01, threshold=5.333e+01, percent-clipped=0.0 2024-08-17 23:19:40,797 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 23:19:46,009 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 23:19:47,348 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-17 23:19:50,305 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 23:19:50,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3570120.0, ans=0.125 2024-08-17 23:20:15,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3570320.0, ans=0.04949747468305833 2024-08-17 23:20:20,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.12 vs. limit=10.0 2024-08-17 23:20:21,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3570320.0, ans=0.0 2024-08-17 23:20:28,278 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 30 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-17 23:20:28,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3570420.0, ans=0.125 2024-08-17 23:20:29,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13900, loss[loss=0.1388, beats_loss=0.006098, ecapa_loss=0.0001786, whisper_loss=0.1309, over 16554.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001462, whisper_loss=0.09144, over 3899036.20 frames. ], batch size: 65, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:20:37,821 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-17 23:20:41,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3570420.0, ans=0.2 2024-08-17 23:21:09,754 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-17 23:21:16,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3570720.0, ans=0.125 2024-08-17 23:21:18,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3570720.0, ans=0.0 2024-08-17 23:21:20,252 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.60 vs. limit=22.5 2024-08-17 23:21:21,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3570720.0, ans=0.04949747468305833 2024-08-17 23:21:27,106 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 23:21:31,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=15.0 2024-08-17 23:21:39,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 13950, loss[loss=0.1121, beats_loss=0.007355, ecapa_loss=0.0001522, whisper_loss=0.1032, over 16113.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.000146, whisper_loss=0.09119, over 3897296.04 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:21:44,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570920.0, ans=0.1 2024-08-17 23:21:44,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3570920.0, ans=0.0 2024-08-17 23:21:45,226 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.372e+01 2.631e+01 3.013e+01 8.564e+01, threshold=5.263e+01, percent-clipped=2.0 2024-08-17 23:21:45,439 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-17 23:21:47,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3570920.0, ans=0.1 2024-08-17 23:22:10,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3571120.0, ans=0.125 2024-08-17 23:22:29,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3571220.0, ans=0.0 2024-08-17 23:22:42,745 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-17 23:22:45,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3571320.0, ans=0.1 2024-08-17 23:22:48,580 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.968e+05 2024-08-17 23:22:50,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14000, loss[loss=0.09043, beats_loss=0.01178, ecapa_loss=0.0001095, whisper_loss=0.07755, over 16016.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01051, ecapa_loss=0.0001468, whisper_loss=0.0911, over 3913977.30 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:23:01,663 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-17 23:23:03,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3571420.0, ans=0.125 2024-08-17 23:23:17,042 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 23:23:19,675 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-17 23:23:22,467 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-17 23:23:25,498 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-17 23:23:26,756 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-17 23:23:28,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3571620.0, ans=0.05 2024-08-17 23:23:38,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3571720.0, ans=0.2 2024-08-17 23:24:01,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14050, loss[loss=0.08933, beats_loss=0.01074, ecapa_loss=0.0001513, whisper_loss=0.07708, over 18916.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01051, ecapa_loss=0.0001461, whisper_loss=0.09107, over 3903629.70 frames. ], batch size: 75, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:24:06,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.345e+01 2.533e+01 2.843e+01 6.962e+01, threshold=5.065e+01, percent-clipped=1.0 2024-08-17 23:24:19,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3572020.0, ans=0.125 2024-08-17 23:24:20,637 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-17 23:24:21,882 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-17 23:24:50,272 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-17 23:25:06,717 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-17 23:25:09,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14100, loss[loss=0.09874, beats_loss=0.008689, ecapa_loss=0.0001722, whisper_loss=0.08833, over 14151.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001463, whisper_loss=0.09032, over 3871668.79 frames. ], batch size: 55, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:25:19,398 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=12.0 2024-08-17 23:25:28,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3572520.0, ans=0.125 2024-08-17 23:25:31,253 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-17 23:25:43,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3572620.0, ans=0.125 2024-08-17 23:26:17,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14150, loss[loss=0.09471, beats_loss=0.01045, ecapa_loss=0.0001238, whisper_loss=0.08302, over 18020.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001453, whisper_loss=0.08958, over 3837149.14 frames. ], batch size: 72, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:26:17,907 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-17 23:26:19,164 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 23:26:22,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.363e+01 2.612e+01 2.970e+01 1.774e+02, threshold=5.225e+01, percent-clipped=3.0 2024-08-17 23:26:27,232 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-17 23:26:33,173 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-17 23:26:33,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-08-17 23:26:40,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2024-08-17 23:26:53,386 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-17 23:27:00,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-17 23:27:01,386 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 23 from Vox, 48 fro AS 2024-08-17 23:27:01,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3573220.0, ans=0.125 2024-08-17 23:27:16,274 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.813e+00 2024-08-17 23:27:23,115 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 23:27:24,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3573320.0, ans=0.04949747468305833 2024-08-17 23:27:25,535 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-17 23:27:26,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14200, loss[loss=0.08805, beats_loss=0.01325, ecapa_loss=0.0001156, whisper_loss=0.07364, over 23549.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01065, ecapa_loss=0.0001454, whisper_loss=0.08958, over 3881785.43 frames. ], batch size: 94, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:27:27,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3573420.0, ans=0.0 2024-08-17 23:27:34,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3573420.0, ans=0.05 2024-08-17 23:28:20,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3573820.0, ans=0.1 2024-08-17 23:28:22,675 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-17 23:28:26,633 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-17 23:28:33,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14250, loss[loss=0.1122, beats_loss=0.006413, ecapa_loss=0.0001567, whisper_loss=0.1042, over 15069.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.0001459, whisper_loss=0.09, over 3874523.17 frames. ], batch size: 56, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:28:34,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3573920.0, ans=0.035 2024-08-17 23:28:34,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3573920.0, ans=0.125 2024-08-17 23:28:38,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.386e+01 2.591e+01 2.986e+01 7.501e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-17 23:28:48,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3574020.0, ans=0.125 2024-08-17 23:28:54,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3574020.0, ans=0.0 2024-08-17 23:29:02,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-17 23:29:12,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3574120.0, ans=0.125 2024-08-17 23:29:14,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3574220.0, ans=0.0 2024-08-17 23:29:24,804 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-17 23:29:41,527 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-17 23:29:42,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14300, loss[loss=0.1104, beats_loss=0.01099, ecapa_loss=0.0001291, whisper_loss=0.09817, over 22286.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001467, whisper_loss=0.08978, over 3866135.50 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:29:42,963 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-17 23:29:45,671 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-17 23:29:57,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3574520.0, ans=0.1 2024-08-17 23:30:07,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.75 vs. limit=10.0 2024-08-17 23:30:10,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2024-08-17 23:30:13,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3574620.0, ans=0.0 2024-08-17 23:30:18,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3574620.0, ans=0.015 2024-08-17 23:30:20,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3574620.0, ans=0.125 2024-08-17 23:30:24,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3574720.0, ans=0.1 2024-08-17 23:30:24,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-17 23:30:48,760 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-17 23:30:51,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3574920.0, ans=0.04949747468305833 2024-08-17 23:30:52,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14350, loss[loss=0.1001, beats_loss=0.009048, ecapa_loss=0.000165, whisper_loss=0.08942, over 14780.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001467, whisper_loss=0.09009, over 3877692.56 frames. ], batch size: 60, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:30:52,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3574920.0, ans=0.2 2024-08-17 23:30:57,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.260e+01 2.482e+01 2.823e+01 6.177e+01, threshold=4.963e+01, percent-clipped=1.0 2024-08-17 23:31:02,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3574920.0, ans=0.125 2024-08-17 23:31:51,094 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-17 23:31:55,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3575320.0, ans=0.125 2024-08-17 23:32:01,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14400, loss[loss=0.09965, beats_loss=0.01046, ecapa_loss=0.0001247, whisper_loss=0.08794, over 17051.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001471, whisper_loss=0.09011, over 3876657.20 frames. ], batch size: 63, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:32:21,456 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-17 23:32:24,359 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-17 23:32:27,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.32 vs. limit=15.0 2024-08-17 23:32:28,786 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:32:47,129 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-17 23:32:48,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3575720.0, ans=0.2 2024-08-17 23:33:13,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3575820.0, ans=0.125 2024-08-17 23:33:13,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3575820.0, ans=0.0 2024-08-17 23:33:15,033 INFO [train_multi_KD3.py:1116] (3/4) Epoch 24, batch 14450, loss[loss=0.1165, beats_loss=0.009753, ecapa_loss=0.0001698, whisper_loss=0.105, over 21525.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001476, whisper_loss=0.09019, over 3888471.72 frames. ], batch size: 88, lr: 2.53e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:33:21,142 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.457e+01 2.673e+01 3.011e+01 5.974e+01, threshold=5.346e+01, percent-clipped=2.0 2024-08-17 23:33:28,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3576020.0, ans=0.1 2024-08-17 23:33:39,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3576020.0, ans=0.0 2024-08-17 23:33:43,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3576120.0, ans=0.1 2024-08-17 23:33:50,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.48 vs. limit=15.0 2024-08-17 23:33:51,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3576120.0, ans=0.1 2024-08-17 23:34:06,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3576220.0, ans=0.0 2024-08-17 23:34:55,192 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 0, loss[loss=0.0967, beats_loss=0.009565, ecapa_loss=0.0001686, whisper_loss=0.08545, over 15212.00 frames. ], tot_loss[loss=0.0967, beats_loss=0.009565, ecapa_loss=0.0001686, whisper_loss=0.08545, over 15212.00 frames. ], batch size: 63, lr: 2.48e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:34:55,193 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-17 23:35:34,998 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000529, whisper_loss=0.2477, over 922467.00 frames. 2024-08-17 23:35:49,858 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on SV_voxceleb1: loss=0.004106, beats_loss=0, ecapa_loss=0.0004106, whisper_loss=0, over 939242.00 frames. 2024-08-17 23:37:32,581 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on AT_audioset: loss=0.02333, beats_loss=0.02333, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-17 23:37:32,589 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-17 23:37:52,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2024-08-17 23:37:54,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3576320.0, ans=0.0 2024-08-17 23:38:23,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3576520.0, ans=0.125 2024-08-17 23:38:28,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3576520.0, ans=0.1 2024-08-17 23:38:32,854 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-17 23:39:31,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3576820.0, ans=0.5 2024-08-17 23:39:32,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 50, loss[loss=0.09836, beats_loss=0.0109, ecapa_loss=0.0001037, whisper_loss=0.08642, over 17078.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009446, ecapa_loss=0.0001513, whisper_loss=0.09054, over 900752.28 frames. ], batch size: 65, lr: 2.48e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:39:36,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3576820.0, ans=0.125 2024-08-17 23:39:37,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3576820.0, ans=12.0 2024-08-17 23:39:43,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3576820.0, ans=0.0 2024-08-17 23:39:59,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-08-17 23:40:04,011 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.415e+01 2.699e+01 3.079e+01 5.308e+01, threshold=5.398e+01, percent-clipped=0.0 2024-08-17 23:40:05,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3576920.0, ans=0.2 2024-08-17 23:40:21,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3577020.0, ans=0.125 2024-08-17 23:40:26,178 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-17 23:40:30,447 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-17 23:40:51,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3577120.0, ans=0.125 2024-08-17 23:40:53,548 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-17 23:40:57,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3577120.0, ans=0.0 2024-08-17 23:41:21,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 100, loss[loss=0.09722, beats_loss=0.01031, ecapa_loss=0.0001392, whisper_loss=0.08552, over 22140.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009373, ecapa_loss=0.0001488, whisper_loss=0.09105, over 1543013.15 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:41:39,054 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 23:41:41,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3577320.0, ans=0.07 2024-08-17 23:41:55,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3577420.0, ans=0.0 2024-08-17 23:41:57,063 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-17 23:42:21,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3577520.0, ans=0.125 2024-08-17 23:42:30,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3577620.0, ans=0.035 2024-08-17 23:42:52,560 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 23:43:04,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 150, loss[loss=0.09758, beats_loss=0.01028, ecapa_loss=0.000156, whisper_loss=0.08575, over 22394.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.009438, ecapa_loss=0.000149, whisper_loss=0.09026, over 2035480.09 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:43:05,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3577820.0, ans=0.0 2024-08-17 23:43:14,090 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-17 23:43:27,246 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.000e+01 2.543e+01 2.785e+01 3.052e+01 4.688e+01, threshold=5.571e+01, percent-clipped=0.0 2024-08-17 23:43:31,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3577920.0, ans=15.0 2024-08-17 23:43:36,252 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.279e+05 2024-08-17 23:44:11,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3578220.0, ans=0.05 2024-08-17 23:44:15,362 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-17 23:44:23,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 200, loss[loss=0.08623, beats_loss=0.01215, ecapa_loss=0.0001411, whisper_loss=0.07267, over 15049.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.00967, ecapa_loss=0.0001505, whisper_loss=0.09048, over 2404729.81 frames. ], batch size: 62, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:44:50,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3578520.0, ans=0.125 2024-08-17 23:44:50,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3578520.0, ans=0.1 2024-08-17 23:44:56,549 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-17 23:45:03,678 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-17 23:45:17,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.36 vs. limit=6.0 2024-08-17 23:45:27,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3578720.0, ans=0.125 2024-08-17 23:45:33,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 250, loss[loss=0.09183, beats_loss=0.01445, ecapa_loss=0.000127, whisper_loss=0.07611, over 19936.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01002, ecapa_loss=0.0001481, whisper_loss=0.08927, over 2717662.94 frames. ], batch size: 77, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:45:44,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3578820.0, ans=0.2 2024-08-17 23:45:45,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3578820.0, ans=0.0 2024-08-17 23:45:51,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3578920.0, ans=0.0 2024-08-17 23:45:53,631 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.761e+01 2.355e+01 2.707e+01 2.999e+01 3.108e+02, threshold=5.414e+01, percent-clipped=1.0 2024-08-17 23:45:55,082 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 23:45:58,064 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-17 23:46:03,473 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-17 23:46:06,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3579020.0, ans=0.07 2024-08-17 23:46:07,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2024-08-17 23:46:10,063 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-17 23:46:41,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 300, loss[loss=0.1033, beats_loss=0.00854, ecapa_loss=0.0001482, whisper_loss=0.09332, over 22917.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01009, ecapa_loss=0.0001497, whisper_loss=0.0896, over 2985894.10 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:46:48,022 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:46:55,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.48 vs. limit=6.0 2024-08-17 23:47:06,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3579420.0, ans=0.0 2024-08-17 23:47:08,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-17 23:47:11,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=12.00 vs. limit=12.0 2024-08-17 23:47:13,107 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-17 23:47:26,844 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-17 23:47:40,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3579720.0, ans=0.2 2024-08-17 23:47:48,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 350, loss[loss=0.1071, beats_loss=0.01141, ecapa_loss=0.0001594, whisper_loss=0.09408, over 21764.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01023, ecapa_loss=0.0001488, whisper_loss=0.08919, over 3179530.64 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:47:52,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3579820.0, ans=0.125 2024-08-17 23:47:53,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3579820.0, ans=0.125 2024-08-17 23:48:02,321 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-17 23:48:07,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.271e+01 2.566e+01 2.908e+01 1.421e+02, threshold=5.133e+01, percent-clipped=1.0 2024-08-17 23:48:32,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3580120.0, ans=0.125 2024-08-17 23:48:35,713 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-17 23:48:45,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3580220.0, ans=0.0 2024-08-17 23:48:45,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3580220.0, ans=0.0 2024-08-17 23:48:56,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 400, loss[loss=0.09879, beats_loss=0.01141, ecapa_loss=0.0001473, whisper_loss=0.08591, over 22795.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01039, ecapa_loss=0.0001473, whisper_loss=0.08909, over 3350272.19 frames. ], batch size: 92, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-17 23:49:09,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-08-17 23:49:13,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-08-17 23:49:15,300 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-17 23:49:18,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3580420.0, ans=0.125 2024-08-17 23:49:24,522 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-17 23:49:24,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3580520.0, ans=0.0 2024-08-17 23:49:30,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3580520.0, ans=0.125 2024-08-17 23:49:33,148 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-17 23:49:48,124 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-17 23:49:49,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3580720.0, ans=0.09899494936611666 2024-08-17 23:49:55,187 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-17 23:49:59,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3580720.0, ans=0.0 2024-08-17 23:50:04,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 450, loss[loss=0.1137, beats_loss=0.008446, ecapa_loss=0.0001503, whisper_loss=0.1038, over 20615.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001473, whisper_loss=0.08946, over 3463783.60 frames. ], batch size: 80, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:50:23,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.243e+01 2.550e+01 2.884e+01 5.686e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-17 23:50:26,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3580920.0, ans=0.125 2024-08-17 23:50:38,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3581020.0, ans=0.2 2024-08-17 23:50:39,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3581020.0, ans=0.1 2024-08-17 23:50:50,976 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-17 23:50:53,815 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-17 23:51:11,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 500, loss[loss=0.1139, beats_loss=0.01, ecapa_loss=0.0001422, whisper_loss=0.1025, over 22111.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001472, whisper_loss=0.08974, over 3568358.36 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:51:19,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3581320.0, ans=15.0 2024-08-17 23:51:22,560 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-17 23:51:30,803 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-17 23:51:34,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3581420.0, ans=0.125 2024-08-17 23:51:35,099 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-17 23:51:38,126 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-17 23:51:44,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.68 vs. limit=22.5 2024-08-17 23:51:56,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-17 23:52:00,683 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-17 23:52:08,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-17 23:52:19,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 550, loss[loss=0.1237, beats_loss=0.007975, ecapa_loss=0.0002012, whisper_loss=0.1137, over 18565.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001464, whisper_loss=0.0901, over 3617422.65 frames. ], batch size: 75, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:52:38,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.286e+01 2.510e+01 2.772e+01 4.019e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-17 23:52:42,385 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-17 23:53:07,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3582120.0, ans=0.1 2024-08-17 23:53:15,343 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-17 23:53:27,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 600, loss[loss=0.1082, beats_loss=0.007938, ecapa_loss=0.0001532, whisper_loss=0.09872, over 20730.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001453, whisper_loss=0.09004, over 3669177.49 frames. ], batch size: 79, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:53:29,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3582320.0, ans=0.125 2024-08-17 23:53:37,248 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-17 23:53:53,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3582520.0, ans=0.125 2024-08-17 23:54:00,011 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-17 23:54:10,901 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-17 23:54:19,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3582620.0, ans=0.0 2024-08-17 23:54:31,486 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-17 23:54:35,279 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 650, loss[loss=0.1052, beats_loss=0.01056, ecapa_loss=0.0001191, whisper_loss=0.09347, over 19766.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001459, whisper_loss=0.08954, over 3691303.70 frames. ], batch size: 75, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:54:35,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3582820.0, ans=0.2 2024-08-17 23:54:39,172 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-17 23:54:51,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3582920.0, ans=0.125 2024-08-17 23:54:52,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3582920.0, ans=0.0 2024-08-17 23:54:53,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.263e+01 2.531e+01 2.852e+01 5.403e+01, threshold=5.063e+01, percent-clipped=1.0 2024-08-17 23:55:36,012 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-17 23:55:42,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 700, loss[loss=0.1012, beats_loss=0.01142, ecapa_loss=0.0001156, whisper_loss=0.08861, over 21658.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01037, ecapa_loss=0.000146, whisper_loss=0.08967, over 3716772.44 frames. ], batch size: 82, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:55:42,796 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-17 23:55:45,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3583320.0, ans=0.125 2024-08-17 23:55:48,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3583320.0, ans=0.125 2024-08-17 23:55:52,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3583320.0, ans=0.0 2024-08-17 23:55:52,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=15.0 2024-08-17 23:55:54,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3583420.0, ans=0.035 2024-08-17 23:56:16,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3583520.0, ans=0.125 2024-08-17 23:56:17,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2024-08-17 23:56:22,510 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-17 23:56:24,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=12.0 2024-08-17 23:56:29,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3583620.0, ans=0.0 2024-08-17 23:56:50,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 750, loss[loss=0.1163, beats_loss=0.008871, ecapa_loss=0.0001338, whisper_loss=0.1061, over 23162.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001451, whisper_loss=0.08985, over 3751081.67 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:56:55,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3583820.0, ans=0.04949747468305833 2024-08-17 23:57:10,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.877e+01 2.300e+01 2.481e+01 2.765e+01 4.539e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-17 23:57:15,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3583920.0, ans=0.1 2024-08-17 23:57:29,776 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 30 from Vox, 39 fro AS 2024-08-17 23:57:58,496 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 800, loss[loss=0.1037, beats_loss=0.009917, ecapa_loss=0.0001435, whisper_loss=0.0923, over 22755.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001442, whisper_loss=0.08952, over 3788371.98 frames. ], batch size: 87, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:58:01,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.07 vs. limit=15.0 2024-08-17 23:58:13,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2024-08-17 23:58:43,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3584620.0, ans=0.0 2024-08-17 23:58:47,824 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2024-08-17 23:58:53,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3584720.0, ans=0.1 2024-08-17 23:59:02,334 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-17 23:59:02,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3584720.0, ans=0.125 2024-08-17 23:59:04,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 850, loss[loss=0.08178, beats_loss=0.009745, ecapa_loss=0.0001524, whisper_loss=0.07051, over 20944.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01036, ecapa_loss=0.000144, whisper_loss=0.08864, over 3795408.30 frames. ], batch size: 80, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-17 23:59:18,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3584920.0, ans=0.125 2024-08-17 23:59:24,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.267e+01 2.454e+01 2.766e+01 5.930e+01, threshold=4.908e+01, percent-clipped=1.0 2024-08-17 23:59:30,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3584920.0, ans=0.0 2024-08-17 23:59:30,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3584920.0, ans=0.09899494936611666 2024-08-17 23:59:31,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.49 vs. limit=22.5 2024-08-17 23:59:39,325 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-17 23:59:42,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3585020.0, ans=0.0 2024-08-17 23:59:57,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3585120.0, ans=0.125 2024-08-18 00:00:05,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-18 00:00:13,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 900, loss[loss=0.1004, beats_loss=0.01217, ecapa_loss=0.0001391, whisper_loss=0.0868, over 16919.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01032, ecapa_loss=0.0001433, whisper_loss=0.08826, over 3770384.88 frames. ], batch size: 69, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:00:20,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3585320.0, ans=0.125 2024-08-18 00:00:31,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3585420.0, ans=0.125 2024-08-18 00:00:32,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3585420.0, ans=0.125 2024-08-18 00:00:34,632 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2024-08-18 00:00:37,155 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-18 00:00:38,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3585420.0, ans=0.125 2024-08-18 00:00:39,689 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 00:00:40,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3585520.0, ans=0.125 2024-08-18 00:00:42,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3585520.0, ans=0.05 2024-08-18 00:00:45,307 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 00:00:49,305 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 00:00:50,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3585520.0, ans=0.125 2024-08-18 00:00:54,481 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 27 from Vox, 19 fro AS 2024-08-18 00:01:02,842 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 00:01:12,654 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 00:01:20,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 950, loss[loss=0.1024, beats_loss=0.009972, ecapa_loss=0.0001188, whisper_loss=0.09124, over 20048.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01027, ecapa_loss=0.0001433, whisper_loss=0.08886, over 3764656.09 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:01:21,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3585820.0, ans=0.0 2024-08-18 00:01:21,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3585820.0, ans=0.1 2024-08-18 00:01:36,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3585920.0, ans=0.2 2024-08-18 00:01:41,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.321e+01 2.549e+01 2.773e+01 6.184e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 00:02:08,492 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 00:02:08,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3586120.0, ans=0.1 2024-08-18 00:02:15,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3586220.0, ans=0.2 2024-08-18 00:02:27,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3586320.0, ans=0.2 2024-08-18 00:02:28,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1000, loss[loss=0.1158, beats_loss=0.01048, ecapa_loss=0.0001088, whisper_loss=0.1042, over 17634.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01029, ecapa_loss=0.0001429, whisper_loss=0.08878, over 3761262.11 frames. ], batch size: 67, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:02:39,883 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 00:02:59,376 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 00:03:02,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3586520.0, ans=0.125 2024-08-18 00:03:10,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3586620.0, ans=0.125 2024-08-18 00:03:20,716 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 00:03:22,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3586720.0, ans=0.125 2024-08-18 00:03:25,943 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 00:03:29,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3586720.0, ans=0.125 2024-08-18 00:03:36,345 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1050, loss[loss=0.1014, beats_loss=0.01104, ecapa_loss=0.000117, whisper_loss=0.08919, over 21931.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01034, ecapa_loss=0.0001427, whisper_loss=0.08855, over 3769456.28 frames. ], batch size: 85, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:03:43,086 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 30 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 00:03:56,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3586920.0, ans=0.0 2024-08-18 00:03:57,046 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.345e+01 2.573e+01 2.782e+01 6.018e+01, threshold=5.145e+01, percent-clipped=1.0 2024-08-18 00:04:07,240 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 00:04:17,874 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 00:04:22,981 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 00:04:23,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.06 vs. limit=10.0 2024-08-18 00:04:32,701 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 00:04:43,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1100, loss[loss=0.09266, beats_loss=0.009532, ecapa_loss=0.0001419, whisper_loss=0.08171, over 21353.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0103, ecapa_loss=0.0001421, whisper_loss=0.08961, over 3798733.32 frames. ], batch size: 83, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:04:58,100 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 00:05:27,163 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 00:05:28,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3587620.0, ans=0.1 2024-08-18 00:05:31,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3587620.0, ans=0.0 2024-08-18 00:05:37,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2024-08-18 00:05:51,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1150, loss[loss=0.1086, beats_loss=0.01156, ecapa_loss=0.000123, whisper_loss=0.09578, over 19198.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001427, whisper_loss=0.08955, over 3809046.08 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:06:02,546 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 00:06:03,978 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 9 from Vox, 25 fro AS 2024-08-18 00:06:12,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.421e+01 2.640e+01 3.029e+01 4.672e+01, threshold=5.280e+01, percent-clipped=0.0 2024-08-18 00:06:13,918 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 00:06:17,404 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.697e+05 2024-08-18 00:06:24,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.35 vs. limit=22.5 2024-08-18 00:06:27,348 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.322e+01 2024-08-18 00:06:32,241 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 00:06:36,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3588120.0, ans=0.125 2024-08-18 00:06:41,821 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 00:06:48,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3588220.0, ans=0.125 2024-08-18 00:06:50,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3588220.0, ans=0.125 2024-08-18 00:07:00,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1200, loss[loss=0.1104, beats_loss=0.01002, ecapa_loss=0.0001585, whisper_loss=0.09883, over 15267.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001429, whisper_loss=0.08923, over 3809008.23 frames. ], batch size: 58, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:07:10,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3588320.0, ans=0.125 2024-08-18 00:07:14,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3588420.0, ans=0.125 2024-08-18 00:07:26,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3588520.0, ans=0.0 2024-08-18 00:07:28,645 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-18 00:07:38,322 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-18 00:07:42,849 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.280e-03 2024-08-18 00:07:44,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3588620.0, ans=0.125 2024-08-18 00:07:50,104 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-08-18 00:08:02,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.38 vs. limit=15.0 2024-08-18 00:08:10,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1250, loss[loss=0.12, beats_loss=0.008605, ecapa_loss=0.0001275, whisper_loss=0.1102, over 20000.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01038, ecapa_loss=0.0001438, whisper_loss=0.0893, over 3795764.25 frames. ], batch size: 75, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:08:23,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3588920.0, ans=0.0 2024-08-18 00:08:30,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.265e+01 2.496e+01 2.765e+01 1.417e+02, threshold=4.991e+01, percent-clipped=1.0 2024-08-18 00:08:31,962 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.511e+00 2024-08-18 00:08:35,571 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-18 00:08:43,887 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 00:08:46,799 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 00:08:48,060 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 35 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 00:08:51,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3589120.0, ans=0.2 2024-08-18 00:08:59,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3589120.0, ans=0.125 2024-08-18 00:09:17,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1300, loss[loss=0.1004, beats_loss=0.01322, ecapa_loss=0.0001238, whisper_loss=0.08593, over 19128.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01047, ecapa_loss=0.0001441, whisper_loss=0.0891, over 3806436.82 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:09:19,987 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 00:09:25,471 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 00:09:33,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3589420.0, ans=0.0 2024-08-18 00:09:49,844 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 00:10:09,278 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 00:10:26,944 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 00:10:30,872 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1350, loss[loss=0.1237, beats_loss=0.007379, ecapa_loss=0.0001639, whisper_loss=0.1147, over 20399.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001435, whisper_loss=0.0898, over 3829795.87 frames. ], batch size: 76, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:10:42,862 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 00:10:44,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3589920.0, ans=0.1 2024-08-18 00:10:50,433 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.253e+01 2.540e+01 2.866e+01 1.653e+02, threshold=5.079e+01, percent-clipped=1.0 2024-08-18 00:11:00,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3590020.0, ans=0.125 2024-08-18 00:11:02,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.91 vs. limit=22.5 2024-08-18 00:11:04,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3590020.0, ans=0.0 2024-08-18 00:11:06,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3590020.0, ans=0.125 2024-08-18 00:11:21,441 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 00:11:21,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3590120.0, ans=0.125 2024-08-18 00:11:26,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3590220.0, ans=0.0 2024-08-18 00:11:40,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1400, loss[loss=0.1068, beats_loss=0.008801, ecapa_loss=0.000143, whisper_loss=0.09657, over 15460.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01048, ecapa_loss=0.0001435, whisper_loss=0.08913, over 3842696.79 frames. ], batch size: 56, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:11:54,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3590420.0, ans=0.07 2024-08-18 00:12:03,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3590420.0, ans=0.1 2024-08-18 00:12:17,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.26 vs. limit=10.0 2024-08-18 00:12:34,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3590620.0, ans=0.0 2024-08-18 00:12:51,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1450, loss[loss=0.09616, beats_loss=0.01242, ecapa_loss=0.0001194, whisper_loss=0.08255, over 18392.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.0001423, whisper_loss=0.08866, over 3841166.54 frames. ], batch size: 74, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:13:08,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3590820.0, ans=0.1 2024-08-18 00:13:23,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.734e+01 2.250e+01 2.508e+01 2.687e+01 4.238e+01, threshold=5.016e+01, percent-clipped=0.0 2024-08-18 00:13:30,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3590920.0, ans=0.2 2024-08-18 00:13:37,682 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 00:13:47,315 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-18 00:13:59,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3591120.0, ans=0.125 2024-08-18 00:14:13,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3591220.0, ans=0.125 2024-08-18 00:14:35,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1500, loss[loss=0.108, beats_loss=0.01102, ecapa_loss=0.0001437, whisper_loss=0.09557, over 22689.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01055, ecapa_loss=0.0001425, whisper_loss=0.08841, over 3811656.77 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:14:55,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3591320.0, ans=0.0 2024-08-18 00:15:03,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-08-18 00:15:37,401 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 00:15:44,175 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 00:15:52,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3591620.0, ans=0.07 2024-08-18 00:16:19,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3591720.0, ans=0.125 2024-08-18 00:16:25,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3591720.0, ans=0.1 2024-08-18 00:16:26,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3591820.0, ans=0.2 2024-08-18 00:16:27,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1550, loss[loss=0.1162, beats_loss=0.01005, ecapa_loss=0.0001326, whisper_loss=0.1049, over 22419.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01056, ecapa_loss=0.0001418, whisper_loss=0.08894, over 3807836.31 frames. ], batch size: 86, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:16:30,261 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 24 from Vox, 19 fro AS 2024-08-18 00:16:40,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.73 vs. limit=15.0 2024-08-18 00:17:02,556 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.260e+01 2.619e+01 2.888e+01 4.592e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-18 00:17:03,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3591920.0, ans=0.125 2024-08-18 00:17:05,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3591920.0, ans=0.125 2024-08-18 00:17:13,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3592020.0, ans=0.125 2024-08-18 00:17:37,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3592120.0, ans=0.2 2024-08-18 00:17:50,702 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 00:18:16,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1600, loss[loss=0.08604, beats_loss=0.01128, ecapa_loss=0.0001494, whisper_loss=0.07327, over 15662.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001418, whisper_loss=0.08945, over 3796578.79 frames. ], batch size: 63, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:18:18,746 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 00:18:19,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3592320.0, ans=0.125 2024-08-18 00:18:41,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3592420.0, ans=0.125 2024-08-18 00:18:57,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3592520.0, ans=0.04949747468305833 2024-08-18 00:19:00,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3592520.0, ans=10.0 2024-08-18 00:19:03,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=15.0 2024-08-18 00:19:13,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3592620.0, ans=0.125 2024-08-18 00:19:14,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2024-08-18 00:19:29,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3592720.0, ans=0.125 2024-08-18 00:19:38,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1650, loss[loss=0.114, beats_loss=0.008148, ecapa_loss=0.0001818, whisper_loss=0.104, over 17533.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001417, whisper_loss=0.08983, over 3824589.15 frames. ], batch size: 72, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:19:42,842 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 00:19:44,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3592820.0, ans=0.1 2024-08-18 00:19:49,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3592820.0, ans=0.125 2024-08-18 00:19:54,316 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 00:19:59,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.229e+01 2.477e+01 2.865e+01 9.039e+01, threshold=4.953e+01, percent-clipped=1.0 2024-08-18 00:20:06,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3593020.0, ans=0.125 2024-08-18 00:20:31,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3593120.0, ans=0.125 2024-08-18 00:20:34,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3593220.0, ans=0.125 2024-08-18 00:20:35,591 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 00:20:38,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.20 vs. limit=5.0 2024-08-18 00:20:41,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3593220.0, ans=0.1 2024-08-18 00:20:42,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3593220.0, ans=0.1 2024-08-18 00:20:46,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3593320.0, ans=0.125 2024-08-18 00:20:46,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3593320.0, ans=0.0 2024-08-18 00:20:47,731 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1700, loss[loss=0.0928, beats_loss=0.01034, ecapa_loss=0.0001598, whisper_loss=0.08086, over 20627.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001422, whisper_loss=0.09044, over 3845927.55 frames. ], batch size: 86, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:21:03,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3593420.0, ans=0.0 2024-08-18 00:21:07,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-18 00:21:27,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3593620.0, ans=0.2 2024-08-18 00:21:46,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3593720.0, ans=0.125 2024-08-18 00:21:54,629 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1750, loss[loss=0.08687, beats_loss=0.009882, ecapa_loss=0.0001799, whisper_loss=0.07519, over 14663.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001427, whisper_loss=0.09021, over 3820278.47 frames. ], batch size: 61, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:22:08,608 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 00:22:15,136 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.504e+01 2.766e+01 3.067e+01 1.559e+02, threshold=5.531e+01, percent-clipped=2.0 2024-08-18 00:22:18,584 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593920.0, ans=0.125 2024-08-18 00:22:23,345 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 00:22:32,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3594020.0, ans=0.125 2024-08-18 00:22:32,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-18 00:22:43,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3594120.0, ans=0.1 2024-08-18 00:22:46,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3594120.0, ans=0.125 2024-08-18 00:22:48,735 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 00:22:51,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3594220.0, ans=0.1 2024-08-18 00:22:55,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3594220.0, ans=0.09899494936611666 2024-08-18 00:22:58,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3594220.0, ans=0.0 2024-08-18 00:23:02,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1800, loss[loss=0.08489, beats_loss=0.01454, ecapa_loss=0.0001156, whisper_loss=0.0692, over 17143.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01037, ecapa_loss=0.0001434, whisper_loss=0.08981, over 3832504.29 frames. ], batch size: 69, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:23:20,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3594420.0, ans=0.2 2024-08-18 00:23:21,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3594420.0, ans=0.0 2024-08-18 00:23:52,798 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.157e-03 2024-08-18 00:23:55,128 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 00:24:09,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1850, loss[loss=0.1199, beats_loss=0.008178, ecapa_loss=0.0001138, whisper_loss=0.1106, over 15835.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01045, ecapa_loss=0.0001429, whisper_loss=0.08912, over 3837805.18 frames. ], batch size: 58, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:24:18,291 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2024-08-18 00:24:20,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3594820.0, ans=0.95 2024-08-18 00:24:22,702 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 00:24:29,357 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.383e+01 2.607e+01 2.993e+01 6.397e+01, threshold=5.213e+01, percent-clipped=1.0 2024-08-18 00:24:36,416 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 00:24:44,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3595020.0, ans=0.125 2024-08-18 00:24:57,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3595120.0, ans=0.125 2024-08-18 00:24:59,933 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 00:25:02,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3595120.0, ans=0.125 2024-08-18 00:25:04,509 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 00:25:15,383 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 00:25:17,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1900, loss[loss=0.1122, beats_loss=0.01038, ecapa_loss=0.0001184, whisper_loss=0.1006, over 17988.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001427, whisper_loss=0.08912, over 3826658.17 frames. ], batch size: 68, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:25:41,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3595420.0, ans=0.2 2024-08-18 00:25:48,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3595520.0, ans=0.125 2024-08-18 00:25:48,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3595520.0, ans=0.125 2024-08-18 00:25:49,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3595520.0, ans=0.125 2024-08-18 00:26:01,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3595620.0, ans=0.125 2024-08-18 00:26:04,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3595620.0, ans=0.125 2024-08-18 00:26:05,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-18 00:26:08,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3595620.0, ans=0.125 2024-08-18 00:26:10,587 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 00:26:23,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 1950, loss[loss=0.0937, beats_loss=0.01185, ecapa_loss=0.0001298, whisper_loss=0.08056, over 21131.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01054, ecapa_loss=0.0001428, whisper_loss=0.08846, over 3808604.99 frames. ], batch size: 85, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:26:29,459 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 00:26:29,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.46 vs. limit=22.5 2024-08-18 00:26:34,674 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 00:26:36,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3595920.0, ans=0.1 2024-08-18 00:26:37,693 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 00:26:39,202 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 00:26:39,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3595920.0, ans=0.0 2024-08-18 00:26:39,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3595920.0, ans=0.2 2024-08-18 00:26:44,849 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.310e+01 2.471e+01 2.786e+01 3.161e+02, threshold=4.942e+01, percent-clipped=2.0 2024-08-18 00:26:47,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3595920.0, ans=0.0 2024-08-18 00:26:53,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3596020.0, ans=0.125 2024-08-18 00:26:55,895 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 00:27:16,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.56 vs. limit=15.0 2024-08-18 00:27:21,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3596220.0, ans=0.125 2024-08-18 00:27:32,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2000, loss[loss=0.1006, beats_loss=0.01001, ecapa_loss=0.000161, whisper_loss=0.08893, over 13783.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0105, ecapa_loss=0.0001419, whisper_loss=0.08859, over 3817121.75 frames. ], batch size: 55, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:28:12,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3596620.0, ans=0.125 2024-08-18 00:28:17,812 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 00:28:24,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3596620.0, ans=0.125 2024-08-18 00:28:27,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3596720.0, ans=0.0 2024-08-18 00:28:34,219 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 00:28:40,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2050, loss[loss=0.07884, beats_loss=0.01226, ecapa_loss=0.0001286, whisper_loss=0.06529, over 15262.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01047, ecapa_loss=0.0001427, whisper_loss=0.08934, over 3841516.99 frames. ], batch size: 62, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:28:49,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3596820.0, ans=0.125 2024-08-18 00:28:49,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3596820.0, ans=15.0 2024-08-18 00:28:55,411 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.553e-03 2024-08-18 00:29:00,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.408e+01 2.657e+01 2.912e+01 3.324e+02, threshold=5.315e+01, percent-clipped=5.0 2024-08-18 00:29:08,589 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 00:29:16,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597020.0, ans=0.1 2024-08-18 00:29:29,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.91 vs. limit=10.0 2024-08-18 00:29:39,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3597220.0, ans=0.125 2024-08-18 00:29:46,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2100, loss[loss=0.08904, beats_loss=0.00738, ecapa_loss=0.0001661, whisper_loss=0.08, over 14513.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.08883, over 3837214.19 frames. ], batch size: 55, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:29:48,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3597320.0, ans=0.125 2024-08-18 00:29:49,487 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 00:29:59,529 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 00:30:18,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3597520.0, ans=0.0 2024-08-18 00:30:26,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.41 vs. limit=22.5 2024-08-18 00:30:50,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2150, loss[loss=0.1056, beats_loss=0.01004, ecapa_loss=0.0001185, whisper_loss=0.09435, over 22623.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01064, ecapa_loss=0.0001415, whisper_loss=0.0888, over 3835318.47 frames. ], batch size: 85, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:30:51,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3597820.0, ans=0.1 2024-08-18 00:30:59,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3597820.0, ans=0.2 2024-08-18 00:31:09,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.94 vs. limit=22.5 2024-08-18 00:31:09,716 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.640e+01 2.312e+01 2.569e+01 2.886e+01 3.776e+01, threshold=5.138e+01, percent-clipped=0.0 2024-08-18 00:31:12,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3597920.0, ans=0.2 2024-08-18 00:31:36,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3598120.0, ans=0.1 2024-08-18 00:31:53,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2200, loss[loss=0.09331, beats_loss=0.009587, ecapa_loss=0.0001415, whisper_loss=0.08231, over 18672.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001418, whisper_loss=0.09, over 3837782.44 frames. ], batch size: 71, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:31:58,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3598320.0, ans=0.0 2024-08-18 00:32:02,769 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 00:32:03,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3598320.0, ans=0.1 2024-08-18 00:32:12,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2024-08-18 00:32:20,518 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 00:32:35,527 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 00:32:40,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3598620.0, ans=0.2 2024-08-18 00:32:46,737 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 00:32:54,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-18 00:32:56,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2250, loss[loss=0.1063, beats_loss=0.01079, ecapa_loss=0.0001152, whisper_loss=0.09434, over 23618.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.0904, over 3840426.53 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:33:05,245 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 00:33:14,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.985e+01 2.384e+01 2.636e+01 2.985e+01 4.342e+01, threshold=5.271e+01, percent-clipped=0.0 2024-08-18 00:33:33,582 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 00:33:35,918 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 00:33:39,649 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-18 00:33:45,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3599220.0, ans=0.125 2024-08-18 00:33:54,787 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 27 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 00:33:58,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2300, loss[loss=0.1032, beats_loss=0.01137, ecapa_loss=9.432e-05, whisper_loss=0.09085, over 18118.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01058, ecapa_loss=0.0001411, whisper_loss=0.09064, over 3870395.54 frames. ], batch size: 66, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:34:16,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=3599420.0, ans=0.05 2024-08-18 00:34:25,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2024-08-18 00:34:26,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3599520.0, ans=0.125 2024-08-18 00:34:29,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3599520.0, ans=0.125 2024-08-18 00:34:45,642 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 00:34:55,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3599720.0, ans=0.125 2024-08-18 00:35:02,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2350, loss[loss=0.1324, beats_loss=0.00972, ecapa_loss=0.0001341, whisper_loss=0.1214, over 23245.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01063, ecapa_loss=0.0001415, whisper_loss=0.09102, over 3892890.35 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:35:10,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-08-18 00:35:14,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3599920.0, ans=0.125 2024-08-18 00:35:21,239 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+01 2.275e+01 2.503e+01 2.910e+01 3.749e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 00:35:32,645 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 00:36:04,334 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 00:36:07,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2400, loss[loss=0.1045, beats_loss=0.01162, ecapa_loss=0.0001398, whisper_loss=0.09147, over 21991.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001429, whisper_loss=0.09089, over 3893834.93 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:36:13,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3600320.0, ans=0.035 2024-08-18 00:36:13,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.15 vs. limit=15.0 2024-08-18 00:36:24,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3600420.0, ans=0.0 2024-08-18 00:36:35,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3600520.0, ans=0.125 2024-08-18 00:36:45,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3600620.0, ans=0.0 2024-08-18 00:36:46,850 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 00:36:52,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3600620.0, ans=0.125 2024-08-18 00:37:02,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3600720.0, ans=0.035 2024-08-18 00:37:10,453 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2450, loss[loss=0.0834, beats_loss=0.01284, ecapa_loss=0.0001225, whisper_loss=0.06934, over 15700.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001437, whisper_loss=0.09101, over 3875270.15 frames. ], batch size: 64, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:37:19,442 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 00:37:21,850 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 00:37:28,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.316e+01 2.565e+01 2.962e+01 6.294e+01, threshold=5.130e+01, percent-clipped=1.0 2024-08-18 00:37:48,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3601120.0, ans=0.125 2024-08-18 00:38:02,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3601220.0, ans=0.125 2024-08-18 00:38:04,727 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 00:38:12,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3601320.0, ans=0.0 2024-08-18 00:38:13,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2500, loss[loss=0.09461, beats_loss=0.01177, ecapa_loss=0.0001209, whisper_loss=0.08163, over 22003.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.000144, whisper_loss=0.09065, over 3857520.93 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:38:19,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2024-08-18 00:38:19,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3601320.0, ans=0.0 2024-08-18 00:38:20,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.58 vs. limit=22.5 2024-08-18 00:38:43,212 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 00:38:44,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=12.0 2024-08-18 00:38:50,600 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 00:39:02,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:05,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:09,120 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 00:39:12,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:13,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3601720.0, ans=0.125 2024-08-18 00:39:15,430 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2550, loss[loss=0.08489, beats_loss=0.01315, ecapa_loss=0.0001206, whisper_loss=0.07054, over 23033.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001431, whisper_loss=0.0915, over 3860273.10 frames. ], batch size: 93, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:39:16,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.89 vs. limit=22.5 2024-08-18 00:39:22,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3601820.0, ans=0.0 2024-08-18 00:39:23,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3601820.0, ans=0.025 2024-08-18 00:39:34,102 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.759e+01 2.261e+01 2.549e+01 2.807e+01 3.668e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-18 00:39:35,421 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 00:39:42,073 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 00:39:42,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3602020.0, ans=0.0 2024-08-18 00:39:43,311 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 00:39:54,718 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 00:40:13,885 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 00:40:15,077 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 00:40:15,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2024-08-18 00:40:18,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2600, loss[loss=0.1138, beats_loss=0.009532, ecapa_loss=0.0001435, whisper_loss=0.1028, over 15373.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001434, whisper_loss=0.09101, over 3853481.62 frames. ], batch size: 59, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:40:36,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3602420.0, ans=0.95 2024-08-18 00:40:42,593 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:40:44,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2024-08-18 00:40:50,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3602520.0, ans=0.025 2024-08-18 00:40:55,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3602620.0, ans=0.0 2024-08-18 00:41:05,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3602620.0, ans=0.125 2024-08-18 00:41:21,183 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2650, loss[loss=0.09835, beats_loss=0.01165, ecapa_loss=0.0001514, whisper_loss=0.08519, over 17541.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001445, whisper_loss=0.09073, over 3854372.74 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:41:34,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3602920.0, ans=0.125 2024-08-18 00:41:37,234 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 00:41:38,457 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 00:41:39,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.278e+01 2.614e+01 3.095e+01 1.399e+02, threshold=5.228e+01, percent-clipped=3.0 2024-08-18 00:41:49,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3603020.0, ans=0.2 2024-08-18 00:41:56,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3603020.0, ans=0.125 2024-08-18 00:42:06,274 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 00:42:16,160 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 13 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 00:42:16,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.50 vs. limit=10.0 2024-08-18 00:42:17,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.13 vs. limit=6.0 2024-08-18 00:42:23,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2700, loss[loss=0.09422, beats_loss=0.01152, ecapa_loss=0.0001619, whisper_loss=0.08107, over 18935.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001447, whisper_loss=0.09032, over 3872507.50 frames. ], batch size: 79, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:42:44,337 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 00:42:50,558 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 00:42:54,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3603520.0, ans=0.125 2024-08-18 00:42:59,255 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-18 00:43:12,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3603720.0, ans=0.0 2024-08-18 00:43:14,865 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 00:43:16,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=12.0 2024-08-18 00:43:17,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=12.0 2024-08-18 00:43:19,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3603720.0, ans=0.125 2024-08-18 00:43:23,757 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 00:43:26,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2750, loss[loss=0.09592, beats_loss=0.01225, ecapa_loss=0.0001377, whisper_loss=0.08229, over 15620.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001466, whisper_loss=0.09007, over 3864469.96 frames. ], batch size: 63, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:43:32,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3603820.0, ans=0.1 2024-08-18 00:43:44,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.315e+01 2.507e+01 2.815e+01 4.002e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-18 00:43:53,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3604020.0, ans=0.1 2024-08-18 00:43:55,257 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 00:44:07,965 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 00:44:10,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3604120.0, ans=0.125 2024-08-18 00:44:21,062 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 00:44:21,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3604220.0, ans=0.035 2024-08-18 00:44:21,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3604220.0, ans=0.2 2024-08-18 00:44:24,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3604220.0, ans=0.1 2024-08-18 00:44:29,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2800, loss[loss=0.08727, beats_loss=0.01261, ecapa_loss=0.0001084, whisper_loss=0.07358, over 18587.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001464, whisper_loss=0.09029, over 3874918.70 frames. ], batch size: 73, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:44:37,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3604320.0, ans=0.125 2024-08-18 00:44:46,240 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 00:44:49,269 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 00:44:54,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3604420.0, ans=0.125 2024-08-18 00:44:55,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3604520.0, ans=0.125 2024-08-18 00:45:08,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3604620.0, ans=0.1 2024-08-18 00:45:17,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2024-08-18 00:45:29,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3604720.0, ans=0.0 2024-08-18 00:45:31,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3604720.0, ans=0.2 2024-08-18 00:45:34,111 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.006e-03 2024-08-18 00:45:34,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2850, loss[loss=0.09781, beats_loss=0.009414, ecapa_loss=0.0001603, whisper_loss=0.08679, over 17144.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001446, whisper_loss=0.09086, over 3854586.89 frames. ], batch size: 67, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:45:37,606 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 00:45:40,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3604820.0, ans=0.125 2024-08-18 00:45:52,326 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 00:45:54,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.370e+01 2.583e+01 2.901e+01 2.742e+02, threshold=5.166e+01, percent-clipped=3.0 2024-08-18 00:46:01,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3605020.0, ans=0.125 2024-08-18 00:46:02,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3605020.0, ans=0.0 2024-08-18 00:46:24,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3605120.0, ans=0.125 2024-08-18 00:46:40,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2900, loss[loss=0.09121, beats_loss=0.01202, ecapa_loss=0.0001179, whisper_loss=0.07801, over 15511.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001456, whisper_loss=0.09072, over 3884184.24 frames. ], batch size: 60, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:46:45,736 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 00:46:48,212 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 00:46:52,341 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 00:46:52,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3605420.0, ans=0.125 2024-08-18 00:46:56,279 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 00:46:58,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3605420.0, ans=0.125 2024-08-18 00:47:03,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3605420.0, ans=0.125 2024-08-18 00:47:04,354 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 00:47:45,557 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 2950, loss[loss=0.1054, beats_loss=0.0101, ecapa_loss=0.0001355, whisper_loss=0.09392, over 20140.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001452, whisper_loss=0.09064, over 3893589.85 frames. ], batch size: 80, lr: 2.47e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:47:50,591 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 00:48:03,351 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 00:48:04,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.015e+01 2.406e+01 2.598e+01 2.862e+01 3.819e+01, threshold=5.197e+01, percent-clipped=0.0 2024-08-18 00:48:22,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2024-08-18 00:48:25,260 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 00:48:33,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3606120.0, ans=0.09899494936611666 2024-08-18 00:48:34,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3606220.0, ans=0.2 2024-08-18 00:48:47,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3000, loss[loss=0.1306, beats_loss=0.01068, ecapa_loss=0.0001468, whisper_loss=0.1184, over 23569.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.0001462, whisper_loss=0.09061, over 3931626.48 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:48:47,608 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 00:49:20,098 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005235, whisper_loss=0.2477, over 922467.00 frames. 2024-08-18 00:49:31,336 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3508, 3.1905, 4.0359, 3.9316], device='cuda:3') 2024-08-18 00:49:35,419 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on SV_voxceleb1: loss=0.004164, beats_loss=0, ecapa_loss=0.0004164, whisper_loss=0, over 939242.00 frames. 2024-08-18 00:50:01,213 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.0086, 1.4858, 1.5747, 1.5110, 1.7463, 1.3920, 1.5619, 1.5268], device='cuda:3') 2024-08-18 00:51:10,612 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on AT_audioset: loss=0.02327, beats_loss=0.02327, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 00:51:10,616 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 00:51:31,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3606420.0, ans=0.125 2024-08-18 00:51:49,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3606620.0, ans=0.0 2024-08-18 00:51:51,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3606620.0, ans=0.1 2024-08-18 00:51:52,785 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 00:52:00,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3606720.0, ans=0.125 2024-08-18 00:52:02,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3606720.0, ans=0.2 2024-08-18 00:52:14,465 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-08-18 00:52:14,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3050, loss[loss=0.1077, beats_loss=0.01085, ecapa_loss=0.0001916, whisper_loss=0.09492, over 22102.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001467, whisper_loss=0.09126, over 3911199.49 frames. ], batch size: 96, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 00:52:16,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3606820.0, ans=0.1 2024-08-18 00:52:23,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3606820.0, ans=0.125 2024-08-18 00:52:29,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.10 vs. limit=22.5 2024-08-18 00:52:33,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.357e+01 2.578e+01 2.867e+01 5.974e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-18 00:52:37,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3606920.0, ans=0.0 2024-08-18 00:52:46,055 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 00:52:46,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3607020.0, ans=0.125 2024-08-18 00:52:47,485 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 31 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 00:52:57,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3607120.0, ans=0.1 2024-08-18 00:53:09,546 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 00:53:09,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3607220.0, ans=0.0 2024-08-18 00:53:17,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3607320.0, ans=0.1 2024-08-18 00:53:18,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3100, loss[loss=0.09094, beats_loss=0.01212, ecapa_loss=0.0001529, whisper_loss=0.07728, over 22023.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01061, ecapa_loss=0.0001465, whisper_loss=0.09122, over 3904381.72 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:53:24,472 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 00:53:32,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3607420.0, ans=0.0 2024-08-18 00:53:33,338 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 00:53:47,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3607520.0, ans=0.125 2024-08-18 00:53:54,494 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 00:54:07,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3607720.0, ans=0.125 2024-08-18 00:54:10,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3607720.0, ans=0.125 2024-08-18 00:54:10,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3607720.0, ans=0.125 2024-08-18 00:54:17,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.09 vs. limit=15.0 2024-08-18 00:54:18,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3607720.0, ans=0.0 2024-08-18 00:54:20,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3150, loss[loss=0.1163, beats_loss=0.01077, ecapa_loss=0.0001354, whisper_loss=0.1042, over 23512.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01064, ecapa_loss=0.0001463, whisper_loss=0.09124, over 3911172.83 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:54:23,585 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 12 from Vox, 44 fro AS 2024-08-18 00:54:23,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3607820.0, ans=0.125 2024-08-18 00:54:34,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.34 vs. limit=15.0 2024-08-18 00:54:37,597 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 00:54:41,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.301e+01 2.605e+01 2.918e+01 7.177e+01, threshold=5.210e+01, percent-clipped=2.0 2024-08-18 00:54:51,812 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 00:54:58,244 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 00:54:58,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3608120.0, ans=0.125 2024-08-18 00:55:16,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3608220.0, ans=0.125 2024-08-18 00:55:24,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3200, loss[loss=0.1216, beats_loss=0.006921, ecapa_loss=0.0001692, whisper_loss=0.113, over 18545.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01052, ecapa_loss=0.0001461, whisper_loss=0.09213, over 3906147.31 frames. ], batch size: 74, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:55:33,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3608320.0, ans=0.1 2024-08-18 00:55:46,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3608420.0, ans=0.2 2024-08-18 00:55:52,989 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 00:55:55,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3608520.0, ans=0.125 2024-08-18 00:55:55,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3608520.0, ans=0.125 2024-08-18 00:55:55,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3608520.0, ans=0.125 2024-08-18 00:56:03,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3608620.0, ans=0.2 2024-08-18 00:56:04,710 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 00:56:08,187 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 00:56:28,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3250, loss[loss=0.09516, beats_loss=0.01106, ecapa_loss=0.000147, whisper_loss=0.08263, over 21673.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01059, ecapa_loss=0.0001456, whisper_loss=0.09176, over 3900197.75 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:56:44,616 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 00:56:48,232 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.222e+01 2.522e+01 2.821e+01 3.582e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-18 00:56:50,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3608920.0, ans=0.125 2024-08-18 00:57:31,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3300, loss[loss=0.1113, beats_loss=0.009496, ecapa_loss=0.0001261, whisper_loss=0.1005, over 22680.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01057, ecapa_loss=0.0001453, whisper_loss=0.09143, over 3877931.29 frames. ], batch size: 86, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:57:31,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3609320.0, ans=0.125 2024-08-18 00:57:38,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3609320.0, ans=0.1 2024-08-18 00:58:33,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3350, loss[loss=0.1221, beats_loss=0.01084, ecapa_loss=0.0001217, whisper_loss=0.11, over 23735.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01056, ecapa_loss=0.0001446, whisper_loss=0.09147, over 3864525.53 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:58:51,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3609920.0, ans=0.125 2024-08-18 00:58:52,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2024-08-18 00:58:53,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.852e+01 2.316e+01 2.495e+01 2.843e+01 4.202e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-18 00:58:53,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.46 vs. limit=22.5 2024-08-18 00:58:58,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-18 00:59:04,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3610020.0, ans=0.0 2024-08-18 00:59:18,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3610120.0, ans=0.1 2024-08-18 00:59:24,552 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 00:59:29,439 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 00:59:29,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3610220.0, ans=0.125 2024-08-18 00:59:36,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3400, loss[loss=0.1003, beats_loss=0.01054, ecapa_loss=0.0001339, whisper_loss=0.08842, over 16156.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.09068, over 3868381.09 frames. ], batch size: 65, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 00:59:45,628 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 00:59:50,340 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 00:59:56,908 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 01:00:00,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-08-18 01:00:09,385 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 01:00:12,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3610520.0, ans=0.0 2024-08-18 01:00:27,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3610720.0, ans=0.0 2024-08-18 01:00:30,676 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-18 01:00:32,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=3610720.0, ans=0.2 2024-08-18 01:00:35,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=15.0 2024-08-18 01:00:39,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3450, loss[loss=0.1158, beats_loss=0.01052, ecapa_loss=0.0001651, whisper_loss=0.1037, over 23362.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001456, whisper_loss=0.09094, over 3899099.44 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:01:00,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.519e+01 2.832e+01 7.230e+01, threshold=5.039e+01, percent-clipped=2.0 2024-08-18 01:01:02,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3610920.0, ans=0.05 2024-08-18 01:01:13,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3611020.0, ans=0.125 2024-08-18 01:01:22,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611120.0, ans=0.1 2024-08-18 01:01:33,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3611220.0, ans=0.125 2024-08-18 01:01:34,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3611220.0, ans=0.125 2024-08-18 01:01:39,766 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 01:01:42,219 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 01:01:43,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3500, loss[loss=0.1203, beats_loss=0.01143, ecapa_loss=0.0001055, whisper_loss=0.1078, over 18335.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001457, whisper_loss=0.09093, over 3902390.81 frames. ], batch size: 68, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:01:52,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.30 vs. limit=22.5 2024-08-18 01:02:04,529 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 01:02:11,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3611520.0, ans=0.125 2024-08-18 01:02:11,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3611520.0, ans=0.125 2024-08-18 01:02:12,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3611520.0, ans=0.07 2024-08-18 01:02:13,489 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 01:02:14,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3611520.0, ans=0.125 2024-08-18 01:02:22,140 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 01:02:28,882 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 01:02:32,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-18 01:02:34,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3611720.0, ans=0.125 2024-08-18 01:02:48,014 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3550, loss[loss=0.1027, beats_loss=0.01203, ecapa_loss=0.0001069, whisper_loss=0.08962, over 14428.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.09098, over 3896731.73 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:02:51,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3611820.0, ans=0.035 2024-08-18 01:02:52,269 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 01:02:52,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3611820.0, ans=0.125 2024-08-18 01:03:02,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2024-08-18 01:03:07,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3611920.0, ans=0.1 2024-08-18 01:03:09,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.315e+01 2.541e+01 2.759e+01 3.730e+01, threshold=5.082e+01, percent-clipped=0.0 2024-08-18 01:03:14,911 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 01:03:22,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3612020.0, ans=0.125 2024-08-18 01:03:29,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3612120.0, ans=0.125 2024-08-18 01:03:34,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3612120.0, ans=0.025 2024-08-18 01:03:41,024 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 01:03:54,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3600, loss[loss=0.09096, beats_loss=0.01271, ecapa_loss=0.0001276, whisper_loss=0.07697, over 19891.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.000145, whisper_loss=0.09088, over 3884893.74 frames. ], batch size: 80, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:03:57,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3612320.0, ans=0.125 2024-08-18 01:03:57,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3612320.0, ans=0.0 2024-08-18 01:04:25,314 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 01:04:25,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3612520.0, ans=0.0 2024-08-18 01:04:25,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2024-08-18 01:04:36,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3612620.0, ans=0.09899494936611666 2024-08-18 01:04:39,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3612620.0, ans=0.125 2024-08-18 01:04:43,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3612620.0, ans=0.1 2024-08-18 01:05:02,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3650, loss[loss=0.09604, beats_loss=0.009694, ecapa_loss=0.0001645, whisper_loss=0.0847, over 21612.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001454, whisper_loss=0.09103, over 3875874.43 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:05:05,966 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:05:20,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3612920.0, ans=0.125 2024-08-18 01:05:24,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.387e+01 2.671e+01 3.011e+01 4.673e+01, threshold=5.342e+01, percent-clipped=0.0 2024-08-18 01:05:24,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3612920.0, ans=0.0 2024-08-18 01:05:27,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3612920.0, ans=0.0 2024-08-18 01:05:30,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.81 vs. limit=15.0 2024-08-18 01:05:44,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-08-18 01:05:45,337 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 01:05:56,291 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 01:06:10,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3700, loss[loss=0.1158, beats_loss=0.007472, ecapa_loss=0.0001681, whisper_loss=0.1066, over 19721.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001458, whisper_loss=0.09116, over 3880335.88 frames. ], batch size: 78, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:06:15,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3613320.0, ans=0.1 2024-08-18 01:06:24,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3613420.0, ans=0.1 2024-08-18 01:06:54,758 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-18 01:06:56,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3613620.0, ans=0.1 2024-08-18 01:07:13,590 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-18 01:07:18,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3750, loss[loss=0.1181, beats_loss=0.01061, ecapa_loss=0.0001261, whisper_loss=0.1063, over 22651.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01041, ecapa_loss=0.0001455, whisper_loss=0.09198, over 3930209.28 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:07:27,701 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.30 vs. limit=15.0 2024-08-18 01:07:31,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3613920.0, ans=0.125 2024-08-18 01:07:34,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3613920.0, ans=0.0 2024-08-18 01:07:37,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3613920.0, ans=0.0 2024-08-18 01:07:40,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.244e+01 2.423e+01 2.765e+01 4.056e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-18 01:08:09,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3614120.0, ans=0.125 2024-08-18 01:08:11,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-08-18 01:08:12,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.76 vs. limit=15.0 2024-08-18 01:08:13,538 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 01:08:24,596 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 01:08:27,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3800, loss[loss=0.1132, beats_loss=0.01174, ecapa_loss=0.0001232, whisper_loss=0.1003, over 21659.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001453, whisper_loss=0.09163, over 3940617.10 frames. ], batch size: 84, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:08:28,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3614320.0, ans=0.125 2024-08-18 01:08:31,370 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09450822323560715, model_norm_threshold=48.454036712646484 2024-08-18 01:08:31,541 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.766e+04, grad_sumsq=3.766e+04, orig_rms_sq=1.000e+00 2024-08-18 01:08:34,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3614320.0, ans=0.2 2024-08-18 01:08:47,891 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 01:08:53,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3614520.0, ans=0.05 2024-08-18 01:09:05,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3614520.0, ans=0.1 2024-08-18 01:09:08,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3614620.0, ans=0.125 2024-08-18 01:09:17,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3614620.0, ans=0.0 2024-08-18 01:09:26,653 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-18 01:09:32,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3614720.0, ans=0.0 2024-08-18 01:09:34,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3614720.0, ans=0.125 2024-08-18 01:09:36,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3850, loss[loss=0.09508, beats_loss=0.01139, ecapa_loss=0.0001263, whisper_loss=0.08242, over 14602.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01053, ecapa_loss=0.0001454, whisper_loss=0.09115, over 3918148.31 frames. ], batch size: 58, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:09:50,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3614920.0, ans=0.0 2024-08-18 01:09:53,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3614920.0, ans=0.125 2024-08-18 01:09:59,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.260e+01 2.585e+01 2.853e+01 5.127e+02, threshold=5.170e+01, percent-clipped=2.0 2024-08-18 01:10:09,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3615020.0, ans=0.125 2024-08-18 01:10:26,811 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 01:10:39,279 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 01:10:43,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3615220.0, ans=0.0 2024-08-18 01:10:46,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3900, loss[loss=0.119, beats_loss=0.0081, ecapa_loss=0.00017, whisper_loss=0.1092, over 23400.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001463, whisper_loss=0.09093, over 3920301.82 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:10:50,771 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 01:11:16,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-08-18 01:11:34,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3615620.0, ans=0.1 2024-08-18 01:11:44,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3615720.0, ans=0.2 2024-08-18 01:11:50,277 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 01:11:51,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.71 vs. limit=15.0 2024-08-18 01:11:52,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2024-08-18 01:11:55,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 3950, loss[loss=0.1122, beats_loss=0.00896, ecapa_loss=0.000144, whisper_loss=0.1018, over 18198.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01048, ecapa_loss=0.0001465, whisper_loss=0.09169, over 3939378.75 frames. ], batch size: 71, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:12:17,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.347e+01 2.555e+01 2.976e+01 4.736e+01, threshold=5.111e+01, percent-clipped=0.0 2024-08-18 01:12:25,738 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.91 vs. limit=15.0 2024-08-18 01:12:36,152 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 01:12:41,532 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 01:12:41,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3616120.0, ans=0.09899494936611666 2024-08-18 01:12:41,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3616120.0, ans=0.125 2024-08-18 01:12:47,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3616120.0, ans=0.09899494936611666 2024-08-18 01:12:51,198 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 01:12:55,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3616220.0, ans=0.1 2024-08-18 01:13:03,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3616320.0, ans=0.0 2024-08-18 01:13:04,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4000, loss[loss=0.06275, beats_loss=0.01089, ecapa_loss=0.0001436, whisper_loss=0.05042, over 14340.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001467, whisper_loss=0.09116, over 3910078.00 frames. ], batch size: 57, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:13:04,704 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 01:13:19,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3616420.0, ans=0.0 2024-08-18 01:13:21,495 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 01:13:27,074 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 10 from Vox, 42 fro AS 2024-08-18 01:13:29,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3616420.0, ans=0.125 2024-08-18 01:13:38,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3616520.0, ans=0.125 2024-08-18 01:13:44,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.76 vs. limit=12.0 2024-08-18 01:13:45,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3616620.0, ans=0.125 2024-08-18 01:14:13,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4050, loss[loss=0.107, beats_loss=0.008735, ecapa_loss=0.0001533, whisper_loss=0.09674, over 18314.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001465, whisper_loss=0.0906, over 3915015.05 frames. ], batch size: 74, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:14:16,424 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 01:14:19,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3616820.0, ans=0.05 2024-08-18 01:14:36,060 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.283e+01 2.581e+01 2.870e+01 1.380e+02, threshold=5.162e+01, percent-clipped=1.0 2024-08-18 01:14:43,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.71 vs. limit=10.0 2024-08-18 01:14:45,616 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 01:14:48,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3617020.0, ans=0.1 2024-08-18 01:14:49,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3617020.0, ans=0.0 2024-08-18 01:15:00,582 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 01:15:13,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3617220.0, ans=0.0 2024-08-18 01:15:16,068 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 01:15:20,139 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 01:15:23,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4100, loss[loss=0.1077, beats_loss=0.01152, ecapa_loss=0.0001343, whisper_loss=0.09483, over 22855.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01047, ecapa_loss=0.0001478, whisper_loss=0.0916, over 3888804.14 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:15:32,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.99 vs. limit=6.0 2024-08-18 01:15:55,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-08-18 01:15:57,758 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 01:15:59,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3617520.0, ans=0.0 2024-08-18 01:16:08,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2024-08-18 01:16:11,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3617620.0, ans=0.1 2024-08-18 01:16:17,945 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 01:16:30,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4150, loss[loss=0.08898, beats_loss=0.01257, ecapa_loss=0.0001353, whisper_loss=0.07506, over 21534.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01057, ecapa_loss=0.0001477, whisper_loss=0.09131, over 3910118.70 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:16:41,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3617820.0, ans=0.1 2024-08-18 01:16:49,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3617920.0, ans=0.0 2024-08-18 01:16:51,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.314e+01 2.519e+01 2.790e+01 4.414e+01, threshold=5.038e+01, percent-clipped=0.0 2024-08-18 01:16:58,217 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 01:17:07,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-08-18 01:17:20,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-18 01:17:37,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4200, loss[loss=0.09726, beats_loss=0.0109, ecapa_loss=0.0001597, whisper_loss=0.08476, over 21108.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01061, ecapa_loss=0.0001474, whisper_loss=0.09069, over 3885208.68 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:17:45,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3618320.0, ans=0.125 2024-08-18 01:17:45,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3618320.0, ans=0.0 2024-08-18 01:17:57,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3618420.0, ans=0.0 2024-08-18 01:18:05,101 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 01:18:20,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3618620.0, ans=0.125 2024-08-18 01:18:22,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3618620.0, ans=0.125 2024-08-18 01:18:29,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3618720.0, ans=0.1 2024-08-18 01:18:37,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.89 vs. limit=15.0 2024-08-18 01:18:41,214 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-18 01:18:42,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4250, loss[loss=0.1008, beats_loss=0.01078, ecapa_loss=0.0001315, whisper_loss=0.08869, over 21703.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001477, whisper_loss=0.09081, over 3883606.80 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:18:57,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618920.0, ans=0.1 2024-08-18 01:18:59,685 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 01:19:00,883 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-18 01:19:03,388 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.262e+01 2.529e+01 2.791e+01 4.848e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-18 01:19:04,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3618920.0, ans=0.1 2024-08-18 01:19:05,967 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 01:19:22,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3619120.0, ans=0.125 2024-08-18 01:19:27,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2024-08-18 01:19:34,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2024-08-18 01:19:37,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3619220.0, ans=0.0 2024-08-18 01:19:43,459 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 01:19:46,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-18 01:19:48,225 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4300, loss[loss=0.0865, beats_loss=0.0129, ecapa_loss=0.0001297, whisper_loss=0.07231, over 21426.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.000147, whisper_loss=0.09121, over 3889689.98 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:19:48,332 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 01:19:48,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3619320.0, ans=0.1 2024-08-18 01:19:58,947 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 01:20:00,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=12.0 2024-08-18 01:20:01,361 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 01:20:03,927 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 01:20:21,765 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 01:20:24,269 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 01:20:27,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3619620.0, ans=0.125 2024-08-18 01:20:34,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3619620.0, ans=0.04949747468305833 2024-08-18 01:20:36,872 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 01:20:38,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3619720.0, ans=0.2 2024-08-18 01:20:41,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3619720.0, ans=0.125 2024-08-18 01:20:43,252 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 26 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-18 01:20:45,555 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-18 01:20:48,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=12.0 2024-08-18 01:20:48,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=3619720.0, ans=12.0 2024-08-18 01:20:50,881 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 01:20:52,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4350, loss[loss=0.1124, beats_loss=0.01155, ecapa_loss=0.0001527, whisper_loss=0.09934, over 22641.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001478, whisper_loss=0.09118, over 3871762.24 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:21:03,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3619920.0, ans=0.2 2024-08-18 01:21:05,653 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-18 01:21:11,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3619920.0, ans=0.1 2024-08-18 01:21:12,855 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.260e+01 2.529e+01 2.654e+01 4.063e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-18 01:21:27,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3620020.0, ans=0.125 2024-08-18 01:21:35,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3620120.0, ans=0.025 2024-08-18 01:21:41,662 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 01:21:41,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3620120.0, ans=0.1 2024-08-18 01:21:55,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.0 2024-08-18 01:21:57,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4400, loss[loss=0.1013, beats_loss=0.01202, ecapa_loss=0.0001325, whisper_loss=0.08797, over 22405.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.000147, whisper_loss=0.09174, over 3882304.61 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:22:01,788 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.211e+00 2024-08-18 01:22:04,061 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 01:22:14,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.97 vs. limit=6.0 2024-08-18 01:22:19,304 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 01:22:19,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3620420.0, ans=0.0 2024-08-18 01:22:21,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.13 vs. limit=22.5 2024-08-18 01:22:35,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3620520.0, ans=0.1 2024-08-18 01:22:36,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.56 vs. limit=22.5 2024-08-18 01:22:41,215 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-18 01:22:47,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3620620.0, ans=0.1 2024-08-18 01:23:08,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4450, loss[loss=0.1073, beats_loss=0.01068, ecapa_loss=0.000127, whisper_loss=0.09537, over 23487.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01042, ecapa_loss=0.0001471, whisper_loss=0.09199, over 3899633.96 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:23:28,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-18 01:23:31,467 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.280e+01 2.529e+01 2.933e+01 4.954e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-18 01:23:47,804 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.518e-03 2024-08-18 01:23:58,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2024-08-18 01:24:15,162 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-18 01:24:19,831 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-18 01:24:20,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4500, loss[loss=0.08067, beats_loss=0.01175, ecapa_loss=0.0001692, whisper_loss=0.06723, over 21445.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001463, whisper_loss=0.09129, over 3923573.65 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:24:25,850 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 31 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 01:24:34,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3621420.0, ans=0.015 2024-08-18 01:24:35,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3621420.0, ans=0.125 2024-08-18 01:24:46,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3621420.0, ans=0.2 2024-08-18 01:25:02,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3621520.0, ans=0.0 2024-08-18 01:25:05,520 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 01:25:23,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3621720.0, ans=0.1 2024-08-18 01:25:31,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3621720.0, ans=0.125 2024-08-18 01:25:35,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3621820.0, ans=0.125 2024-08-18 01:25:36,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4550, loss[loss=0.09393, beats_loss=0.01059, ecapa_loss=0.0001483, whisper_loss=0.08185, over 19636.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001461, whisper_loss=0.09105, over 3927848.20 frames. ], batch size: 80, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:25:37,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3621820.0, ans=0.125 2024-08-18 01:25:45,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3621820.0, ans=0.0 2024-08-18 01:26:02,122 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.967e+01 2.248e+01 2.517e+01 2.867e+01 6.444e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-18 01:26:10,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-18 01:26:23,841 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 01:26:57,704 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 01:27:01,924 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 01:27:11,265 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4600, loss[loss=0.1113, beats_loss=0.009706, ecapa_loss=0.0001463, whisper_loss=0.1002, over 19424.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001456, whisper_loss=0.09042, over 3946912.13 frames. ], batch size: 77, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:27:12,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=22.5 2024-08-18 01:27:29,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3622320.0, ans=0.125 2024-08-18 01:27:35,011 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 01:27:35,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.21 vs. limit=10.0 2024-08-18 01:27:51,814 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 01:27:59,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3622520.0, ans=0.0 2024-08-18 01:28:07,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3622520.0, ans=0.1 2024-08-18 01:28:08,857 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 01:28:23,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-08-18 01:28:28,175 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 01:28:35,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3622720.0, ans=0.125 2024-08-18 01:28:42,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4650, loss[loss=0.09682, beats_loss=0.01127, ecapa_loss=0.0001468, whisper_loss=0.08408, over 19386.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001463, whisper_loss=0.09042, over 3954805.37 frames. ], batch size: 76, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:28:59,551 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 01:29:05,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.266e+01 2.454e+01 2.690e+01 4.860e+01, threshold=4.909e+01, percent-clipped=0.0 2024-08-18 01:29:08,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3622920.0, ans=0.05 2024-08-18 01:29:10,746 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 01:29:24,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3623020.0, ans=0.125 2024-08-18 01:29:24,988 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 01:29:27,647 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 01:29:35,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3623120.0, ans=0.0 2024-08-18 01:29:49,881 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 01:29:55,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4700, loss[loss=0.08742, beats_loss=0.01249, ecapa_loss=0.0001272, whisper_loss=0.07366, over 20148.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001453, whisper_loss=0.09043, over 3963984.90 frames. ], batch size: 81, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:30:06,903 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 01:30:54,440 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 01:31:00,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.67 vs. limit=10.0 2024-08-18 01:31:04,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3623720.0, ans=0.0 2024-08-18 01:31:08,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4750, loss[loss=0.0745, beats_loss=0.01417, ecapa_loss=0.0001146, whisper_loss=0.05919, over 14604.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001452, whisper_loss=0.09023, over 3906176.55 frames. ], batch size: 59, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:31:35,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.329e+01 2.503e+01 2.951e+01 3.896e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 01:31:44,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3624020.0, ans=0.125 2024-08-18 01:31:57,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3624020.0, ans=0.2 2024-08-18 01:32:00,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2024-08-18 01:32:02,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3624120.0, ans=0.04949747468305833 2024-08-18 01:32:02,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3624120.0, ans=0.1 2024-08-18 01:32:37,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4800, loss[loss=0.1088, beats_loss=0.01056, ecapa_loss=0.000137, whisper_loss=0.09689, over 14380.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.000146, whisper_loss=0.09048, over 3927611.71 frames. ], batch size: 56, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:32:38,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3624320.0, ans=0.0 2024-08-18 01:32:42,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3624320.0, ans=0.0 2024-08-18 01:33:05,007 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 01:33:05,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3624420.0, ans=0.125 2024-08-18 01:33:12,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3624520.0, ans=0.125 2024-08-18 01:33:20,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=12.0 2024-08-18 01:33:49,202 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 01:33:53,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3624720.0, ans=0.125 2024-08-18 01:34:03,266 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 12 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 01:34:10,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4850, loss[loss=0.1006, beats_loss=0.009392, ecapa_loss=0.000166, whisper_loss=0.08958, over 15475.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01063, ecapa_loss=0.0001462, whisper_loss=0.0894, over 3910083.26 frames. ], batch size: 63, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:34:11,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.17 vs. limit=22.5 2024-08-18 01:34:34,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.805e+01 2.307e+01 2.618e+01 2.911e+01 3.524e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-18 01:34:35,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-08-18 01:34:46,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3625020.0, ans=0.05 2024-08-18 01:34:48,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3625020.0, ans=0.125 2024-08-18 01:34:50,476 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 32 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 01:34:53,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3625120.0, ans=0.2 2024-08-18 01:35:04,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3625120.0, ans=0.2 2024-08-18 01:35:19,419 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 01:35:24,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4900, loss[loss=0.09648, beats_loss=0.01099, ecapa_loss=0.000164, whisper_loss=0.08384, over 21967.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001466, whisper_loss=0.0896, over 3911610.74 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:35:29,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-18 01:35:29,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.22 vs. limit=10.0 2024-08-18 01:35:34,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=22.5 2024-08-18 01:35:36,662 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 01:35:47,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3625420.0, ans=0.2 2024-08-18 01:35:56,282 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 01:36:10,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3625520.0, ans=0.0 2024-08-18 01:36:27,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3625620.0, ans=0.125 2024-08-18 01:36:27,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3625620.0, ans=0.0 2024-08-18 01:36:37,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3625720.0, ans=0.125 2024-08-18 01:36:48,238 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 01:36:51,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 4950, loss[loss=0.1101, beats_loss=0.01172, ecapa_loss=0.0001105, whisper_loss=0.09727, over 23715.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001465, whisper_loss=0.08951, over 3907345.81 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:37:06,303 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 01:37:07,949 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 01:37:11,758 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-18 01:37:12,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3625920.0, ans=0.04949747468305833 2024-08-18 01:37:20,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.328e+01 2.566e+01 3.021e+01 2.036e+02, threshold=5.131e+01, percent-clipped=1.0 2024-08-18 01:37:28,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3626020.0, ans=0.125 2024-08-18 01:37:28,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3626020.0, ans=0.0 2024-08-18 01:37:35,375 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 01:37:58,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-18 01:38:03,427 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 01:38:08,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=15.0 2024-08-18 01:38:20,259 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 01:38:24,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5000, loss[loss=0.1153, beats_loss=0.009423, ecapa_loss=0.000124, whisper_loss=0.1046, over 17922.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001464, whisper_loss=0.09058, over 3874217.13 frames. ], batch size: 66, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:38:44,892 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-18 01:38:53,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3626420.0, ans=0.1 2024-08-18 01:38:59,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2024-08-18 01:39:03,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3626520.0, ans=0.1 2024-08-18 01:39:03,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3626520.0, ans=0.125 2024-08-18 01:39:21,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3626620.0, ans=0.2 2024-08-18 01:39:22,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3626620.0, ans=0.0 2024-08-18 01:39:30,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3626720.0, ans=0.2 2024-08-18 01:39:36,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-18 01:39:38,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5050, loss[loss=0.0829, beats_loss=0.01376, ecapa_loss=0.0001263, whisper_loss=0.06788, over 18805.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001472, whisper_loss=0.09048, over 3855793.16 frames. ], batch size: 75, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:39:39,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3626820.0, ans=0.125 2024-08-18 01:39:43,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3626820.0, ans=0.125 2024-08-18 01:39:52,264 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 01:40:00,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3626920.0, ans=0.125 2024-08-18 01:40:01,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.360e+01 2.516e+01 2.885e+01 7.843e+01, threshold=5.031e+01, percent-clipped=2.0 2024-08-18 01:40:02,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=15.0 2024-08-18 01:40:17,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3627020.0, ans=0.125 2024-08-18 01:40:23,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3627120.0, ans=0.125 2024-08-18 01:40:40,203 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 01:40:44,138 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.602e+00 2024-08-18 01:40:54,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5100, loss[loss=0.09585, beats_loss=0.01155, ecapa_loss=0.0001359, whisper_loss=0.08294, over 22268.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001466, whisper_loss=0.09074, over 3855509.16 frames. ], batch size: 93, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:41:09,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3627320.0, ans=0.125 2024-08-18 01:41:41,714 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-18 01:41:56,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=12.0 2024-08-18 01:41:58,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3627620.0, ans=0.125 2024-08-18 01:41:59,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.29 vs. limit=15.0 2024-08-18 01:42:14,754 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 01:42:20,122 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5150, loss[loss=0.09314, beats_loss=0.01009, ecapa_loss=0.0001603, whisper_loss=0.08144, over 18604.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001464, whisper_loss=0.09108, over 3853327.36 frames. ], batch size: 74, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:42:41,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3627920.0, ans=0.125 2024-08-18 01:42:46,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.463e+01 2.680e+01 3.103e+01 6.728e+01, threshold=5.360e+01, percent-clipped=1.0 2024-08-18 01:42:49,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3627920.0, ans=0.125 2024-08-18 01:42:49,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3627920.0, ans=0.2 2024-08-18 01:43:13,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2024-08-18 01:43:17,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3628120.0, ans=0.125 2024-08-18 01:43:20,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3628220.0, ans=0.125 2024-08-18 01:43:21,881 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 01:43:32,058 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5200, loss[loss=0.1073, beats_loss=0.009735, ecapa_loss=0.0001358, whisper_loss=0.09625, over 17268.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001454, whisper_loss=0.09125, over 3867069.48 frames. ], batch size: 69, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:43:33,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3628320.0, ans=10.0 2024-08-18 01:43:33,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3628320.0, ans=0.0 2024-08-18 01:43:46,094 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 01:43:49,783 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 01:43:52,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3628420.0, ans=0.125 2024-08-18 01:43:53,595 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 01:43:55,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3628420.0, ans=0.2 2024-08-18 01:43:57,229 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 01:43:58,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3628520.0, ans=0.125 2024-08-18 01:44:10,309 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.552e-02 2024-08-18 01:44:26,989 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 01:44:30,505 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 01:44:34,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3628820.0, ans=0.125 2024-08-18 01:44:35,320 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-18 01:44:35,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5250, loss[loss=0.12, beats_loss=0.01048, ecapa_loss=0.0001695, whisper_loss=0.1078, over 22263.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001443, whisper_loss=0.09137, over 3891396.25 frames. ], batch size: 91, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:44:56,619 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.248e+01 2.479e+01 2.831e+01 4.103e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-18 01:45:06,883 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 01:45:10,989 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 01:45:15,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=3629120.0, ans=0.5 2024-08-18 01:45:24,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3629120.0, ans=0.0 2024-08-18 01:45:33,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3629220.0, ans=0.0 2024-08-18 01:45:37,093 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 01:45:40,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5300, loss[loss=0.1307, beats_loss=0.008803, ecapa_loss=0.0001432, whisper_loss=0.1205, over 16992.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.000145, whisper_loss=0.09114, over 3895964.66 frames. ], batch size: 66, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:45:49,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3629320.0, ans=0.0 2024-08-18 01:45:53,541 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 01:45:58,867 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 32 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 01:46:09,883 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 01:46:19,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3629520.0, ans=0.125 2024-08-18 01:46:39,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3629720.0, ans=0.2 2024-08-18 01:46:45,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3629720.0, ans=0.1 2024-08-18 01:46:54,272 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5350, loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001247, whisper_loss=0.09045, over 14378.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001452, whisper_loss=0.0909, over 3883244.24 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:47:01,055 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 10 from Vox, 42 fro AS 2024-08-18 01:47:06,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3629820.0, ans=0.125 2024-08-18 01:47:07,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.42 vs. limit=10.0 2024-08-18 01:47:13,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3629920.0, ans=0.125 2024-08-18 01:47:20,944 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.419e+01 2.661e+01 2.975e+01 6.061e+01, threshold=5.322e+01, percent-clipped=1.0 2024-08-18 01:47:37,406 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 01:47:57,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=3630120.0, ans=10.0 2024-08-18 01:48:12,137 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-18 01:48:19,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5400, loss[loss=0.109, beats_loss=0.01104, ecapa_loss=0.0001188, whisper_loss=0.09682, over 23764.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001452, whisper_loss=0.09063, over 3879838.35 frames. ], batch size: 94, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:48:20,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3630320.0, ans=0.125 2024-08-18 01:48:26,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630320.0, ans=0.1 2024-08-18 01:48:45,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3630420.0, ans=0.0 2024-08-18 01:49:07,895 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 01:49:16,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3630620.0, ans=0.0 2024-08-18 01:49:22,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3630620.0, ans=0.1 2024-08-18 01:50:00,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5450, loss[loss=0.07231, beats_loss=0.01359, ecapa_loss=0.0001621, whisper_loss=0.0571, over 20514.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.09047, over 3873167.48 frames. ], batch size: 89, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:50:08,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3630820.0, ans=0.2 2024-08-18 01:50:11,093 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-18 01:50:24,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3630820.0, ans=0.0 2024-08-18 01:50:28,345 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 01:50:41,908 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.356e+01 2.667e+01 2.958e+01 1.634e+02, threshold=5.334e+01, percent-clipped=1.0 2024-08-18 01:51:24,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3631120.0, ans=0.5 2024-08-18 01:51:29,626 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.88 vs. limit=10.0 2024-08-18 01:51:55,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3631220.0, ans=0.1 2024-08-18 01:51:58,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3631220.0, ans=0.125 2024-08-18 01:52:01,703 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5500, loss[loss=0.1005, beats_loss=0.01157, ecapa_loss=0.0001401, whisper_loss=0.08748, over 21636.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01062, ecapa_loss=0.0001445, whisper_loss=0.09012, over 3877838.06 frames. ], batch size: 88, lr: 2.46e-03, grad_scale: 1.152921504606847e+18 2024-08-18 01:52:14,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3631320.0, ans=0.2 2024-08-18 01:52:23,985 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 01:52:34,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3631420.0, ans=0.125 2024-08-18 01:52:59,213 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 01:53:03,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.41 vs. limit=22.5 2024-08-18 01:53:04,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3631620.0, ans=0.125 2024-08-18 01:53:31,322 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 01:53:51,782 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5550, loss[loss=0.102, beats_loss=0.01014, ecapa_loss=0.0001383, whisper_loss=0.09051, over 18798.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.000145, whisper_loss=0.0906, over 3904736.69 frames. ], batch size: 73, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:54:02,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3631820.0, ans=0.2 2024-08-18 01:54:33,486 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.313e+01 2.540e+01 2.843e+01 1.440e+02, threshold=5.080e+01, percent-clipped=2.0 2024-08-18 01:54:37,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2024-08-18 01:54:53,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3632020.0, ans=0.125 2024-08-18 01:55:12,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.58 vs. limit=15.0 2024-08-18 01:55:19,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632120.0, ans=0.1 2024-08-18 01:55:24,134 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 01:55:25,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3632120.0, ans=0.1 2024-08-18 01:55:47,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3632220.0, ans=0.125 2024-08-18 01:55:51,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3632320.0, ans=0.0 2024-08-18 01:55:52,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5600, loss[loss=0.0983, beats_loss=0.008845, ecapa_loss=0.0001575, whisper_loss=0.08788, over 19035.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001465, whisper_loss=0.09076, over 3883957.41 frames. ], batch size: 79, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:55:56,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3632320.0, ans=0.2 2024-08-18 01:56:26,449 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 01:56:36,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3632520.0, ans=0.0 2024-08-18 01:56:47,586 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=21.34 vs. limit=22.5 2024-08-18 01:56:54,711 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 01:57:10,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5650, loss[loss=0.1256, beats_loss=0.008747, ecapa_loss=0.000171, whisper_loss=0.1152, over 18829.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01063, ecapa_loss=0.000145, whisper_loss=0.09037, over 3917280.68 frames. ], batch size: 77, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:57:17,573 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 01:57:35,817 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.365e+01 2.651e+01 2.943e+01 2.212e+02, threshold=5.303e+01, percent-clipped=3.0 2024-08-18 01:57:36,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3632920.0, ans=0.125 2024-08-18 01:57:47,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.27 vs. limit=10.0 2024-08-18 01:57:49,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3633020.0, ans=0.2 2024-08-18 01:57:49,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3633020.0, ans=0.0 2024-08-18 01:58:00,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3633120.0, ans=0.125 2024-08-18 01:58:04,124 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 01:58:30,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5700, loss[loss=0.0608, beats_loss=0.009095, ecapa_loss=0.0001721, whisper_loss=0.04999, over 13672.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001454, whisper_loss=0.09024, over 3915319.04 frames. ], batch size: 55, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 01:58:30,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3633320.0, ans=0.025 2024-08-18 01:58:42,330 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 01:58:48,395 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 01:58:49,747 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 01:58:51,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3633420.0, ans=0.1 2024-08-18 01:59:07,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3633520.0, ans=0.0 2024-08-18 01:59:14,342 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 01:59:16,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3633520.0, ans=0.1 2024-08-18 01:59:30,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3633620.0, ans=15.0 2024-08-18 01:59:38,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-18 01:59:40,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3633720.0, ans=0.2 2024-08-18 01:59:44,597 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 01:59:47,536 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 01:59:53,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5750, loss[loss=0.1149, beats_loss=0.009592, ecapa_loss=0.0001744, whisper_loss=0.1036, over 20559.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001465, whisper_loss=0.08991, over 3894594.78 frames. ], batch size: 83, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:00:19,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.317e+01 2.629e+01 2.885e+01 4.138e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-18 02:00:21,519 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-18 02:00:23,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3634020.0, ans=0.0 2024-08-18 02:00:24,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3634020.0, ans=0.125 2024-08-18 02:00:35,198 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 02:00:36,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3634120.0, ans=0.125 2024-08-18 02:00:37,759 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 02:00:53,053 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-18 02:01:02,308 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 02:01:08,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5800, loss[loss=0.1011, beats_loss=0.011, ecapa_loss=0.0001568, whisper_loss=0.08854, over 21260.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01066, ecapa_loss=0.000147, whisper_loss=0.08955, over 3893098.82 frames. ], batch size: 87, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:01:19,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-08-18 02:01:23,835 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.470e-01 2024-08-18 02:01:47,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3634520.0, ans=0.07 2024-08-18 02:01:49,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3634520.0, ans=0.09899494936611666 2024-08-18 02:02:12,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3634720.0, ans=0.1 2024-08-18 02:02:22,210 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-18 02:02:25,295 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 02:02:27,861 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5850, loss[loss=0.0926, beats_loss=0.0124, ecapa_loss=0.0001461, whisper_loss=0.07874, over 22481.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001478, whisper_loss=0.08988, over 3897148.51 frames. ], batch size: 92, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:02:38,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=12.0 2024-08-18 02:02:54,041 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 02:02:55,246 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.380e+01 2.623e+01 2.964e+01 4.271e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-18 02:02:55,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3634920.0, ans=0.125 2024-08-18 02:03:12,240 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.169e-02 2024-08-18 02:03:26,370 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 02:03:41,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5900, loss[loss=0.08643, beats_loss=0.01169, ecapa_loss=0.0001385, whisper_loss=0.07335, over 18478.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001485, whisper_loss=0.08991, over 3884268.53 frames. ], batch size: 75, lr: 2.46e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:03:48,767 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 15 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 02:04:00,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3635420.0, ans=0.1 2024-08-18 02:04:00,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3635420.0, ans=0.1 2024-08-18 02:04:15,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3635520.0, ans=22.5 2024-08-18 02:04:41,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2024-08-18 02:04:55,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 5950, loss[loss=0.1145, beats_loss=0.00958, ecapa_loss=0.0001163, whisper_loss=0.1038, over 15300.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001485, whisper_loss=0.09007, over 3919455.71 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:04:58,803 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 02:05:03,486 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 02:05:06,850 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 02:05:15,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-18 02:05:21,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.282e+01 2.567e+01 2.844e+01 4.690e+01, threshold=5.134e+01, percent-clipped=0.0 2024-08-18 02:05:22,046 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 26 from LS+wenet, 16 from Vox, 54 fro AS 2024-08-18 02:05:31,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3636020.0, ans=0.0 2024-08-18 02:05:32,758 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-18 02:05:42,441 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 02:05:45,375 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-18 02:05:51,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3636120.0, ans=0.0 2024-08-18 02:06:09,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-18 02:06:13,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6000, loss[loss=0.1059, beats_loss=0.01127, ecapa_loss=0.0001317, whisper_loss=0.09332, over 22462.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.000147, whisper_loss=0.0904, over 3883793.40 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:06:13,468 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 02:06:43,427 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4078, 3.1599, 2.6493, 1.7552], device='cuda:3') 2024-08-18 02:06:47,687 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on ASR_libri: loss=0.2524, beats_loss=0, ecapa_loss=0.0005148, whisper_loss=0.2472, over 922467.00 frames. 2024-08-18 02:07:03,258 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on SV_voxceleb1: loss=0.004108, beats_loss=0, ecapa_loss=0.0004108, whisper_loss=0, over 939242.00 frames. 2024-08-18 02:08:43,125 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on AT_audioset: loss=0.02328, beats_loss=0.02328, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 02:08:43,137 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 02:09:01,258 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 02:09:11,101 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-18 02:09:12,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3636520.0, ans=0.125 2024-08-18 02:09:12,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3636520.0, ans=0.0 2024-08-18 02:09:13,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3636520.0, ans=0.125 2024-08-18 02:09:23,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3636620.0, ans=0.0 2024-08-18 02:09:26,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=15.0 2024-08-18 02:09:47,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6050, loss[loss=0.1323, beats_loss=0.00864, ecapa_loss=0.0001418, whisper_loss=0.1223, over 22119.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001456, whisper_loss=0.09116, over 3893918.95 frames. ], batch size: 84, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:09:47,752 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 02:09:57,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3636820.0, ans=0.0 2024-08-18 02:09:59,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3636920.0, ans=0.125 2024-08-18 02:10:09,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.386e+01 2.689e+01 3.046e+01 1.667e+02, threshold=5.379e+01, percent-clipped=1.0 2024-08-18 02:10:19,267 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.16 vs. limit=15.0 2024-08-18 02:10:23,872 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 02:10:25,217 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 02:10:30,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3637120.0, ans=0.125 2024-08-18 02:10:47,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3637220.0, ans=0.1 2024-08-18 02:10:52,445 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6100, loss[loss=0.1045, beats_loss=0.009791, ecapa_loss=0.0001375, whisper_loss=0.09329, over 18463.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001461, whisper_loss=0.09074, over 3893460.12 frames. ], batch size: 72, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:11:03,947 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 02:11:18,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3637520.0, ans=0.125 2024-08-18 02:11:19,459 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 02:11:29,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3637620.0, ans=0.0 2024-08-18 02:11:33,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3637620.0, ans=0.2 2024-08-18 02:11:36,711 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 02:11:38,013 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-18 02:11:39,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3637620.0, ans=0.125 2024-08-18 02:11:48,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3637720.0, ans=0.07 2024-08-18 02:11:55,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6150, loss[loss=0.09564, beats_loss=0.01202, ecapa_loss=0.0001514, whisper_loss=0.0821, over 17527.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0106, ecapa_loss=0.0001467, whisper_loss=0.09053, over 3889675.71 frames. ], batch size: 72, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:12:02,129 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 8 from Vox, 35 fro AS 2024-08-18 02:12:14,642 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-18 02:12:16,889 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.229e+01 2.428e+01 2.694e+01 3.816e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-18 02:12:17,818 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2024-08-18 02:12:35,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3638120.0, ans=0.125 2024-08-18 02:12:42,838 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-18 02:12:53,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-18 02:12:59,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6200, loss[loss=0.09315, beats_loss=0.009858, ecapa_loss=0.0001475, whisper_loss=0.08182, over 21613.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001466, whisper_loss=0.09033, over 3885119.91 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:13:00,940 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-18 02:13:05,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3638320.0, ans=0.0 2024-08-18 02:13:14,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3638420.0, ans=0.0 2024-08-18 02:13:49,248 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 02:13:56,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.68 vs. limit=15.0 2024-08-18 02:14:02,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6250, loss[loss=0.1172, beats_loss=0.00974, ecapa_loss=0.0001427, whisper_loss=0.106, over 23478.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001465, whisper_loss=0.09049, over 3865017.87 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:14:07,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3638820.0, ans=0.125 2024-08-18 02:14:16,119 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-18 02:14:19,715 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 15 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 02:14:21,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3638920.0, ans=0.125 2024-08-18 02:14:23,769 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 02:14:24,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.392e+01 2.630e+01 2.962e+01 1.833e+02, threshold=5.259e+01, percent-clipped=3.0 2024-08-18 02:14:45,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3639120.0, ans=0.0 2024-08-18 02:14:49,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3639120.0, ans=0.125 2024-08-18 02:14:50,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3639120.0, ans=0.125 2024-08-18 02:14:52,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3639220.0, ans=0.1 2024-08-18 02:14:55,187 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 02:15:06,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6300, loss[loss=0.09963, beats_loss=0.01128, ecapa_loss=0.000117, whisper_loss=0.08718, over 13800.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001455, whisper_loss=0.0906, over 3868660.72 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:15:10,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3639320.0, ans=0.07 2024-08-18 02:15:10,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3639320.0, ans=0.125 2024-08-18 02:15:14,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3639320.0, ans=0.1 2024-08-18 02:15:15,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3639320.0, ans=0.2 2024-08-18 02:15:21,614 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 02:15:23,454 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 02:15:34,845 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 02:15:36,036 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 02:15:40,981 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 02:15:45,017 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 02:15:50,318 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.074e-02 2024-08-18 02:15:50,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-18 02:16:00,188 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 02:16:04,094 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 18 from LS+wenet, 25 from Vox, 51 fro AS 2024-08-18 02:16:04,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3639720.0, ans=0.04949747468305833 2024-08-18 02:16:10,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6350, loss[loss=0.1045, beats_loss=0.0108, ecapa_loss=0.0001503, whisper_loss=0.09217, over 21165.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.09055, over 3873110.39 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:16:11,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2024-08-18 02:16:32,121 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.636e+01 2.370e+01 2.587e+01 3.086e+01 3.331e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-18 02:16:35,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3639920.0, ans=0.125 2024-08-18 02:16:43,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3640020.0, ans=0.125 2024-08-18 02:16:52,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3640120.0, ans=0.125 2024-08-18 02:16:57,014 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 02:17:00,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3640120.0, ans=0.125 2024-08-18 02:17:03,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3640220.0, ans=0.0 2024-08-18 02:17:07,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3640220.0, ans=0.125 2024-08-18 02:17:08,806 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.763e-01 2024-08-18 02:17:13,521 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 02:17:13,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3640220.0, ans=0.2 2024-08-18 02:17:17,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6400, loss[loss=0.09677, beats_loss=0.01159, ecapa_loss=0.000126, whisper_loss=0.08393, over 22432.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001469, whisper_loss=0.09046, over 3892797.64 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:17:21,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3640320.0, ans=0.1 2024-08-18 02:17:23,617 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 02:17:32,319 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 02:17:32,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640420.0, ans=0.1 2024-08-18 02:17:52,150 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 02:18:00,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640620.0, ans=0.1 2024-08-18 02:18:04,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3640620.0, ans=0.1 2024-08-18 02:18:08,513 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 02:18:13,595 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 02:18:17,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3640720.0, ans=0.0 2024-08-18 02:18:19,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6450, loss[loss=0.1055, beats_loss=0.01078, ecapa_loss=0.0001292, whisper_loss=0.09341, over 22492.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001476, whisper_loss=0.09067, over 3899942.53 frames. ], batch size: 90, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:18:37,235 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=12.0 2024-08-18 02:18:41,562 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.367e+01 2.593e+01 3.031e+01 7.766e+01, threshold=5.185e+01, percent-clipped=4.0 2024-08-18 02:18:50,558 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 02:19:01,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=3641120.0, ans=0.02 2024-08-18 02:19:01,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3641120.0, ans=0.1 2024-08-18 02:19:20,571 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 02:19:22,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6500, loss[loss=0.09546, beats_loss=0.01185, ecapa_loss=0.0001195, whisper_loss=0.08241, over 23393.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001481, whisper_loss=0.09067, over 3889957.12 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:19:29,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3641320.0, ans=0.025 2024-08-18 02:19:50,607 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09095240384340286, model_norm_threshold=51.85380935668945 2024-08-18 02:19:50,775 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.024e+04, grad_sumsq=4.024e+04, orig_rms_sq=1.000e+00 2024-08-18 02:19:54,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3641520.0, ans=0.125 2024-08-18 02:19:58,722 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 02:20:03,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3641620.0, ans=0.125 2024-08-18 02:20:04,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-08-18 02:20:06,043 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 02:20:24,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3641720.0, ans=0.0 2024-08-18 02:20:26,008 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6550, loss[loss=0.1062, beats_loss=0.009849, ecapa_loss=0.0001481, whisper_loss=0.0949, over 21860.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001484, whisper_loss=0.09098, over 3914221.69 frames. ], batch size: 88, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:20:44,983 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-18 02:20:47,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.617e+01 2.352e+01 2.620e+01 3.012e+01 5.701e+02, threshold=5.240e+01, percent-clipped=4.0 2024-08-18 02:21:07,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3642120.0, ans=0.2 2024-08-18 02:21:13,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.45 vs. limit=15.0 2024-08-18 02:21:15,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3642220.0, ans=0.1 2024-08-18 02:21:20,365 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 02:21:29,277 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6600, loss[loss=0.1098, beats_loss=0.01029, ecapa_loss=0.0001594, whisper_loss=0.09794, over 21844.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001494, whisper_loss=0.09129, over 3917122.94 frames. ], batch size: 88, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:22:21,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2024-08-18 02:22:23,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3642720.0, ans=0.2 2024-08-18 02:22:25,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3642720.0, ans=0.125 2024-08-18 02:22:28,607 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 02:22:32,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6650, loss[loss=0.07922, beats_loss=0.01131, ecapa_loss=0.0001374, whisper_loss=0.06654, over 14056.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001496, whisper_loss=0.09081, over 3899836.85 frames. ], batch size: 56, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:22:36,318 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 02:22:38,844 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 02:22:53,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.312e+01 2.682e+01 2.929e+01 4.632e+01, threshold=5.364e+01, percent-clipped=0.0 2024-08-18 02:23:04,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3643020.0, ans=0.0 2024-08-18 02:23:07,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=3643020.0, ans=12.0 2024-08-18 02:23:19,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3643120.0, ans=0.2 2024-08-18 02:23:21,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3643120.0, ans=0.1 2024-08-18 02:23:28,645 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 02:23:36,041 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6700, loss[loss=0.08181, beats_loss=0.009615, ecapa_loss=0.0001873, whisper_loss=0.07032, over 12163.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001492, whisper_loss=0.09079, over 3870092.18 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:23:36,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3643320.0, ans=0.125 2024-08-18 02:23:52,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3643420.0, ans=0.5 2024-08-18 02:23:54,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3643420.0, ans=0.1 2024-08-18 02:23:55,881 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 02:24:10,664 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 02:24:15,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-18 02:24:27,283 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 02:24:30,937 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 02:24:32,123 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 02:24:38,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3643820.0, ans=0.07 2024-08-18 02:24:39,517 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6750, loss[loss=0.1045, beats_loss=0.01033, ecapa_loss=0.0001413, whisper_loss=0.09277, over 16994.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001482, whisper_loss=0.09021, over 3843325.91 frames. ], batch size: 69, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:24:55,168 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 02:24:56,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=3643920.0, ans=0.1 2024-08-18 02:25:01,355 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.309e+01 2.478e+01 2.753e+01 4.386e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 02:25:05,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3644020.0, ans=0.0 2024-08-18 02:25:18,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3644120.0, ans=0.0 2024-08-18 02:25:36,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3644220.0, ans=0.2 2024-08-18 02:25:43,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6800, loss[loss=0.08876, beats_loss=0.01327, ecapa_loss=0.0001574, whisper_loss=0.07391, over 14307.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001491, whisper_loss=0.08999, over 3834784.61 frames. ], batch size: 60, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:25:52,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-18 02:25:53,073 WARNING [optim.py:496] (3/4) Scaling gradients by 0.054543543606996536, model_norm_threshold=49.555973052978516 2024-08-18 02:25:53,235 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.396e+05, grad_sumsq=4.165e+04, orig_rms_sq=3.352e+00 2024-08-18 02:26:02,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3644420.0, ans=0.125 2024-08-18 02:26:16,842 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 02:26:31,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3644620.0, ans=0.125 2024-08-18 02:26:35,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3644720.0, ans=0.1 2024-08-18 02:26:47,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6850, loss[loss=0.1022, beats_loss=0.01176, ecapa_loss=0.0001068, whisper_loss=0.08935, over 17660.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001483, whisper_loss=0.09043, over 3839031.29 frames. ], batch size: 67, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:26:48,059 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 02:26:58,211 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 02:27:09,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.331e+01 2.507e+01 2.800e+01 9.086e+02, threshold=5.014e+01, percent-clipped=3.0 2024-08-18 02:27:11,172 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 02:27:21,502 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 02:27:24,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3645020.0, ans=0.125 2024-08-18 02:27:49,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3645220.0, ans=0.1 2024-08-18 02:27:51,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6900, loss[loss=0.1071, beats_loss=0.01044, ecapa_loss=0.0001518, whisper_loss=0.0951, over 21713.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.08997, over 3857186.89 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:27:59,341 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 02:28:03,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3645420.0, ans=0.5 2024-08-18 02:28:23,475 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 02:28:33,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3645620.0, ans=0.025 2024-08-18 02:28:41,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3645720.0, ans=0.125 2024-08-18 02:28:44,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3645720.0, ans=0.2 2024-08-18 02:28:47,539 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 10 from Vox, 27 fro AS 2024-08-18 02:28:49,777 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 02:28:54,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 6950, loss[loss=0.1056, beats_loss=0.01004, ecapa_loss=0.0001653, whisper_loss=0.09392, over 17490.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001464, whisper_loss=0.08986, over 3858804.25 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:28:56,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-18 02:29:03,027 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 02:29:06,795 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 02:29:12,684 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 02:29:16,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.274e+01 2.573e+01 2.776e+01 4.485e+01, threshold=5.146e+01, percent-clipped=0.0 2024-08-18 02:29:17,858 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 02:29:27,601 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-18 02:29:35,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3646120.0, ans=0.0 2024-08-18 02:29:50,755 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 02:29:58,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-18 02:29:58,557 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 02:29:59,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7000, loss[loss=0.0829, beats_loss=0.01294, ecapa_loss=0.0001109, whisper_loss=0.06885, over 15087.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001467, whisper_loss=0.08976, over 3863475.48 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:30:02,181 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 02:30:17,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646420.0, ans=0.1 2024-08-18 02:30:20,796 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 02:30:21,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-08-18 02:30:33,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3646520.0, ans=0.04949747468305833 2024-08-18 02:30:39,362 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 02:30:46,032 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-18 02:30:54,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2024-08-18 02:31:02,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.87 vs. limit=22.5 2024-08-18 02:31:03,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3646720.0, ans=0.1 2024-08-18 02:31:09,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7050, loss[loss=0.09964, beats_loss=0.009839, ecapa_loss=0.0001534, whisper_loss=0.08827, over 13480.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.000146, whisper_loss=0.08977, over 3875043.83 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:31:09,366 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 02:31:23,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3646920.0, ans=0.1 2024-08-18 02:31:25,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3646920.0, ans=0.125 2024-08-18 02:31:30,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3646920.0, ans=0.2 2024-08-18 02:31:32,741 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.265e+01 2.505e+01 2.734e+01 3.913e+01, threshold=5.009e+01, percent-clipped=0.0 2024-08-18 02:31:49,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3647120.0, ans=0.0 2024-08-18 02:32:02,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.38 vs. limit=22.5 2024-08-18 02:32:05,383 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 02:32:07,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3647220.0, ans=0.125 2024-08-18 02:32:13,625 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7100, loss[loss=0.1197, beats_loss=0.009235, ecapa_loss=0.0001382, whisper_loss=0.109, over 21691.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001469, whisper_loss=0.08935, over 3856908.64 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:32:13,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3647320.0, ans=0.0 2024-08-18 02:32:42,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3647520.0, ans=0.125 2024-08-18 02:33:15,155 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7150, loss[loss=0.1129, beats_loss=0.009023, ecapa_loss=0.000179, whisper_loss=0.102, over 21533.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01058, ecapa_loss=0.0001466, whisper_loss=0.08898, over 3863168.31 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:33:16,700 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:33:22,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3647820.0, ans=0.125 2024-08-18 02:33:31,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3647920.0, ans=0.0 2024-08-18 02:33:36,649 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.260e+01 2.515e+01 2.745e+01 4.282e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-18 02:33:40,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=3648020.0, ans=15.0 2024-08-18 02:33:45,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3648020.0, ans=0.125 2024-08-18 02:34:13,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3648220.0, ans=0.125 2024-08-18 02:34:19,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7200, loss[loss=0.1143, beats_loss=0.008335, ecapa_loss=0.0001793, whisper_loss=0.1042, over 18967.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001471, whisper_loss=0.08995, over 3870377.42 frames. ], batch size: 77, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:34:26,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3648320.0, ans=0.0 2024-08-18 02:34:28,323 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 02:34:44,005 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 02:34:57,578 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-18 02:35:07,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3648620.0, ans=0.125 2024-08-18 02:35:08,697 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-18 02:35:14,770 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 02:35:21,968 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7250, loss[loss=0.09994, beats_loss=0.01121, ecapa_loss=0.0001618, whisper_loss=0.08711, over 17216.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001464, whisper_loss=0.08971, over 3896344.44 frames. ], batch size: 70, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:35:22,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3648820.0, ans=0.2 2024-08-18 02:35:27,025 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 36 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 02:35:34,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3648920.0, ans=0.125 2024-08-18 02:35:35,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3648920.0, ans=0.07 2024-08-18 02:35:43,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.335e+01 2.544e+01 2.816e+01 3.698e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-18 02:35:46,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3649020.0, ans=0.125 2024-08-18 02:35:58,875 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=6.995e-02 2024-08-18 02:36:01,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3649120.0, ans=0.0 2024-08-18 02:36:02,251 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 02:36:11,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-08-18 02:36:16,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3649220.0, ans=0.0 2024-08-18 02:36:24,294 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7300, loss[loss=0.09973, beats_loss=0.009772, ecapa_loss=0.0001476, whisper_loss=0.08848, over 21710.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001471, whisper_loss=0.09043, over 3915601.94 frames. ], batch size: 83, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:36:47,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3649420.0, ans=0.2 2024-08-18 02:37:00,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3649520.0, ans=0.0 2024-08-18 02:37:07,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3649620.0, ans=0.025 2024-08-18 02:37:13,823 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 02:37:24,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3649720.0, ans=0.0 2024-08-18 02:37:27,374 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7350, loss[loss=0.09588, beats_loss=0.01251, ecapa_loss=0.0001396, whisper_loss=0.08197, over 20727.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.000147, whisper_loss=0.09029, over 3914455.10 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:37:28,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.59 vs. limit=22.5 2024-08-18 02:37:32,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3649820.0, ans=0.125 2024-08-18 02:37:48,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.729e+01 2.224e+01 2.419e+01 2.653e+01 3.642e+01, threshold=4.838e+01, percent-clipped=0.0 2024-08-18 02:37:55,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.15 vs. limit=22.5 2024-08-18 02:38:01,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3650020.0, ans=0.1 2024-08-18 02:38:03,711 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-18 02:38:05,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-18 02:38:09,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-18 02:38:19,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3650220.0, ans=0.0 2024-08-18 02:38:29,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7400, loss[loss=0.1023, beats_loss=0.01171, ecapa_loss=0.0001356, whisper_loss=0.08928, over 21642.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001459, whisper_loss=0.08928, over 3904770.83 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:38:54,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3650520.0, ans=0.125 2024-08-18 02:38:58,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=8.0 2024-08-18 02:39:13,726 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 02:39:21,522 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 02:39:24,003 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 02:39:30,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7450, loss[loss=0.1071, beats_loss=0.009665, ecapa_loss=0.0001447, whisper_loss=0.09595, over 19141.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001455, whisper_loss=0.09062, over 3917572.17 frames. ], batch size: 74, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:39:32,526 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 02:39:41,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3650820.0, ans=0.1 2024-08-18 02:39:46,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3650920.0, ans=0.1 2024-08-18 02:39:52,034 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.351e+01 2.534e+01 2.762e+01 3.772e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-18 02:39:53,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3650920.0, ans=0.0 2024-08-18 02:40:08,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3651120.0, ans=0.0 2024-08-18 02:40:21,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3651220.0, ans=0.125 2024-08-18 02:40:21,822 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 02:40:22,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3651220.0, ans=0.125 2024-08-18 02:40:23,019 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 02:40:28,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3651220.0, ans=0.1 2024-08-18 02:40:32,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7500, loss[loss=0.1183, beats_loss=0.009064, ecapa_loss=0.0001544, whisper_loss=0.1077, over 18660.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001458, whisper_loss=0.09071, over 3902413.12 frames. ], batch size: 72, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:40:44,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3651420.0, ans=0.0 2024-08-18 02:40:54,177 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 02:40:54,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3651420.0, ans=0.125 2024-08-18 02:41:11,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.90 vs. limit=12.0 2024-08-18 02:41:13,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3651620.0, ans=0.04949747468305833 2024-08-18 02:41:29,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3651720.0, ans=0.1 2024-08-18 02:41:31,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.61 vs. limit=10.0 2024-08-18 02:41:34,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3651820.0, ans=0.0 2024-08-18 02:41:35,014 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7550, loss[loss=0.0906, beats_loss=0.01328, ecapa_loss=0.0001615, whisper_loss=0.0757, over 14784.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.000147, whisper_loss=0.08963, over 3882389.83 frames. ], batch size: 65, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:41:35,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3651820.0, ans=0.1 2024-08-18 02:41:43,860 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-18 02:41:46,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-18 02:41:56,105 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.283e+01 2.523e+01 2.754e+01 3.706e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-18 02:41:56,724 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:42:34,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3652220.0, ans=0.0 2024-08-18 02:42:37,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7600, loss[loss=0.09577, beats_loss=0.01111, ecapa_loss=0.000215, whisper_loss=0.08251, over 21029.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001466, whisper_loss=0.08995, over 3883446.38 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:42:38,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3652320.0, ans=0.0 2024-08-18 02:42:42,940 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 02:42:46,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2024-08-18 02:42:46,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3652320.0, ans=0.125 2024-08-18 02:42:50,324 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-18 02:42:57,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3652420.0, ans=0.0 2024-08-18 02:43:06,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3652520.0, ans=0.0 2024-08-18 02:43:07,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.06 vs. limit=10.0 2024-08-18 02:43:07,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3652520.0, ans=0.1 2024-08-18 02:43:11,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3652520.0, ans=0.0 2024-08-18 02:43:24,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3652620.0, ans=0.125 2024-08-18 02:43:24,234 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:43:27,644 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 02:43:28,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3652720.0, ans=0.125 2024-08-18 02:43:32,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3652720.0, ans=0.125 2024-08-18 02:43:36,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3652720.0, ans=0.0 2024-08-18 02:43:40,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7650, loss[loss=0.1263, beats_loss=0.009252, ecapa_loss=0.0001096, whisper_loss=0.116, over 15772.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001461, whisper_loss=0.09106, over 3888886.02 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:43:51,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3652920.0, ans=0.125 2024-08-18 02:44:01,635 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.389e+01 2.635e+01 3.049e+01 5.266e+01, threshold=5.270e+01, percent-clipped=1.0 2024-08-18 02:44:20,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3653120.0, ans=0.125 2024-08-18 02:44:23,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=22.5 2024-08-18 02:44:27,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3653120.0, ans=0.0 2024-08-18 02:44:29,369 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 02:44:34,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3653220.0, ans=0.1 2024-08-18 02:44:43,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7700, loss[loss=0.09202, beats_loss=0.01194, ecapa_loss=0.0001621, whisper_loss=0.07846, over 21366.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001459, whisper_loss=0.08988, over 3898674.99 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:44:54,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3653420.0, ans=0.1 2024-08-18 02:44:54,479 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.294e-02 2024-08-18 02:45:05,522 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 02:45:05,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3653420.0, ans=0.125 2024-08-18 02:45:19,388 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:45:27,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3653620.0, ans=0.125 2024-08-18 02:45:40,951 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 02:45:44,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7750, loss[loss=0.11, beats_loss=0.008449, ecapa_loss=0.0001631, whisper_loss=0.09991, over 22721.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001452, whisper_loss=0.0896, over 3936225.57 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:45:51,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3653820.0, ans=0.0 2024-08-18 02:45:53,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3653820.0, ans=0.1 2024-08-18 02:45:57,046 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 30 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 02:46:06,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.345e+01 2.664e+01 3.102e+01 5.489e+01, threshold=5.327e+01, percent-clipped=1.0 2024-08-18 02:46:07,710 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 14 from Vox, 48 fro AS 2024-08-18 02:46:19,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3654020.0, ans=0.2 2024-08-18 02:46:25,418 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-18 02:46:37,458 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 02:46:38,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2024-08-18 02:46:47,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7800, loss[loss=0.09683, beats_loss=0.007722, ecapa_loss=0.0001772, whisper_loss=0.08733, over 13764.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001455, whisper_loss=0.09016, over 3941949.36 frames. ], batch size: 55, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:46:52,565 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 02:47:05,170 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 15 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 02:47:11,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-08-18 02:47:22,734 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 02:47:32,602 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 02:47:44,018 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:47:46,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3654720.0, ans=0.125 2024-08-18 02:47:46,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3654720.0, ans=0.1 2024-08-18 02:47:48,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3654820.0, ans=0.0 2024-08-18 02:47:49,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7850, loss[loss=0.09433, beats_loss=0.01271, ecapa_loss=0.000132, whisper_loss=0.0803, over 20995.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01071, ecapa_loss=0.0001458, whisper_loss=0.08993, over 3938063.26 frames. ], batch size: 85, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:48:10,560 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.350e+01 2.619e+01 2.873e+01 2.983e+02, threshold=5.237e+01, percent-clipped=1.0 2024-08-18 02:48:26,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3655120.0, ans=0.125 2024-08-18 02:48:33,050 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 02:48:34,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3655120.0, ans=0.0 2024-08-18 02:48:51,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7900, loss[loss=0.08575, beats_loss=0.01208, ecapa_loss=0.0001525, whisper_loss=0.07215, over 20823.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01068, ecapa_loss=0.0001463, whisper_loss=0.0903, over 3946116.84 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:49:16,246 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-18 02:49:21,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3655520.0, ans=0.125 2024-08-18 02:49:28,619 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 02:49:30,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3655620.0, ans=0.125 2024-08-18 02:49:53,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 7950, loss[loss=0.09568, beats_loss=0.01175, ecapa_loss=0.0001369, whisper_loss=0.08256, over 22538.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01067, ecapa_loss=0.0001461, whisper_loss=0.09087, over 3934548.05 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:50:11,215 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 02:50:13,831 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 02:50:14,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.294e+01 2.620e+01 2.907e+01 9.033e+01, threshold=5.239e+01, percent-clipped=1.0 2024-08-18 02:50:15,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3655920.0, ans=0.2 2024-08-18 02:50:23,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3656020.0, ans=0.035 2024-08-18 02:50:27,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3656020.0, ans=0.125 2024-08-18 02:50:33,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3656120.0, ans=0.125 2024-08-18 02:50:55,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8000, loss[loss=0.07749, beats_loss=0.01251, ecapa_loss=0.0001342, whisper_loss=0.06364, over 21039.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01073, ecapa_loss=0.000145, whisper_loss=0.09096, over 3949825.31 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:51:08,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3656420.0, ans=0.0 2024-08-18 02:51:14,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3656420.0, ans=0.1 2024-08-18 02:51:17,876 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 02:51:21,758 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 02:51:28,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2024-08-18 02:51:54,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3656720.0, ans=0.09899494936611666 2024-08-18 02:51:57,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8050, loss[loss=0.1032, beats_loss=0.01068, ecapa_loss=0.0001392, whisper_loss=0.09108, over 23542.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01072, ecapa_loss=0.0001448, whisper_loss=0.09064, over 3922722.11 frames. ], batch size: 93, lr: 2.45e-03, grad_scale: 1.152921504606847e+18 2024-08-18 02:52:01,212 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 02:52:02,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3656820.0, ans=0.1 2024-08-18 02:52:09,905 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 02:52:10,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3656920.0, ans=0.0 2024-08-18 02:52:19,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.387e+01 2.609e+01 2.994e+01 4.076e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-18 02:52:27,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3657020.0, ans=0.2 2024-08-18 02:52:27,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2024-08-18 02:52:31,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3657020.0, ans=0.125 2024-08-18 02:52:31,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3657020.0, ans=0.0 2024-08-18 02:52:46,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3657220.0, ans=0.0 2024-08-18 02:52:47,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2024-08-18 02:52:53,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.88 vs. limit=15.0 2024-08-18 02:52:59,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8100, loss[loss=0.1324, beats_loss=0.009021, ecapa_loss=0.0001557, whisper_loss=0.1218, over 22893.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.0001455, whisper_loss=0.09078, over 3910321.81 frames. ], batch size: 91, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:52:59,876 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 02:53:09,883 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-18 02:53:18,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3657420.0, ans=0.125 2024-08-18 02:53:25,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3657520.0, ans=0.0 2024-08-18 02:53:29,757 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 02:53:29,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3657520.0, ans=0.0 2024-08-18 02:53:42,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3657620.0, ans=0.1 2024-08-18 02:53:53,261 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-18 02:53:56,097 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2024-08-18 02:53:58,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3657720.0, ans=0.0 2024-08-18 02:54:02,040 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8150, loss[loss=0.1, beats_loss=0.01204, ecapa_loss=0.0001202, whisper_loss=0.08675, over 19791.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01068, ecapa_loss=0.0001457, whisper_loss=0.09032, over 3903007.06 frames. ], batch size: 78, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:54:05,839 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 02:54:15,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3657920.0, ans=0.0 2024-08-18 02:54:24,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.183e+01 2.473e+01 2.825e+01 3.767e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-18 02:54:38,083 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 02:54:51,716 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 02:55:02,586 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 02:55:03,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8200, loss[loss=0.1191, beats_loss=0.01105, ecapa_loss=0.0001607, whisper_loss=0.1064, over 22598.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01067, ecapa_loss=0.000146, whisper_loss=0.09076, over 3939603.41 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:55:20,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3658420.0, ans=0.125 2024-08-18 02:55:46,886 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 02:55:47,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3658620.0, ans=0.0 2024-08-18 02:55:54,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3658720.0, ans=6.0 2024-08-18 02:55:55,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-18 02:56:05,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8250, loss[loss=0.08167, beats_loss=0.01249, ecapa_loss=0.0001287, whisper_loss=0.06789, over 22026.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0107, ecapa_loss=0.0001451, whisper_loss=0.09017, over 3947482.99 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:56:12,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3658820.0, ans=0.125 2024-08-18 02:56:14,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3658820.0, ans=0.125 2024-08-18 02:56:19,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-18 02:56:27,174 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.331e+01 2.566e+01 2.988e+01 3.927e+01, threshold=5.131e+01, percent-clipped=0.0 2024-08-18 02:56:38,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3659020.0, ans=0.125 2024-08-18 02:56:42,360 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-18 02:57:03,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2024-08-18 02:57:07,215 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8300, loss[loss=0.09444, beats_loss=0.009905, ecapa_loss=0.0001428, whisper_loss=0.0831, over 14155.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01067, ecapa_loss=0.0001445, whisper_loss=0.09019, over 3943883.90 frames. ], batch size: 54, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:57:15,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3659320.0, ans=0.0 2024-08-18 02:57:18,966 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 02:57:51,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3659620.0, ans=0.0 2024-08-18 02:57:51,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3659620.0, ans=0.0 2024-08-18 02:58:03,767 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 02:58:04,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3659720.0, ans=0.125 2024-08-18 02:58:09,751 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8350, loss[loss=0.1143, beats_loss=0.01029, ecapa_loss=0.0001493, whisper_loss=0.1025, over 22571.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01068, ecapa_loss=0.0001448, whisper_loss=0.09022, over 3935124.85 frames. ], batch size: 87, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:58:18,617 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-18 02:58:20,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3659820.0, ans=0.0 2024-08-18 02:58:28,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3659920.0, ans=0.2 2024-08-18 02:58:32,232 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.239e+01 2.555e+01 2.867e+01 4.693e+01, threshold=5.109e+01, percent-clipped=0.0 2024-08-18 02:58:36,204 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 02:59:05,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3660220.0, ans=0.2 2024-08-18 02:59:06,356 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 9 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 02:59:09,904 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-18 02:59:12,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8400, loss[loss=0.07956, beats_loss=0.01068, ecapa_loss=0.0001574, whisper_loss=0.06731, over 16629.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0107, ecapa_loss=0.0001447, whisper_loss=0.08989, over 3920726.19 frames. ], batch size: 73, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 02:59:17,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3660320.0, ans=0.125 2024-08-18 02:59:24,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3660420.0, ans=0.125 2024-08-18 02:59:36,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3660520.0, ans=0.2 2024-08-18 02:59:38,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3660520.0, ans=0.125 2024-08-18 02:59:46,317 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 02:59:59,583 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 03:00:01,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3660720.0, ans=0.125 2024-08-18 03:00:14,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8450, loss[loss=0.1351, beats_loss=0.005796, ecapa_loss=0.0001644, whisper_loss=0.1276, over 23591.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001459, whisper_loss=0.09061, over 3907044.88 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:00:17,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=18.27 vs. limit=15.0 2024-08-18 03:00:28,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3660920.0, ans=0.1 2024-08-18 03:00:29,133 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 27 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 03:00:31,400 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 03:00:32,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3660920.0, ans=0.125 2024-08-18 03:00:35,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.643e+01 2.348e+01 2.564e+01 2.815e+01 4.784e+01, threshold=5.128e+01, percent-clipped=0.0 2024-08-18 03:00:39,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.49 vs. limit=15.0 2024-08-18 03:00:40,280 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 03:00:52,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3661120.0, ans=0.2 2024-08-18 03:01:00,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3661120.0, ans=0.04949747468305833 2024-08-18 03:01:04,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3661220.0, ans=0.125 2024-08-18 03:01:07,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3661220.0, ans=0.125 2024-08-18 03:01:13,615 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 03:01:16,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8500, loss[loss=0.09545, beats_loss=0.0111, ecapa_loss=0.0001691, whisper_loss=0.08266, over 21530.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001461, whisper_loss=0.09071, over 3899091.01 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:01:25,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3661320.0, ans=0.125 2024-08-18 03:01:28,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-18 03:01:35,118 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 6 from Vox, 30 fro AS 2024-08-18 03:01:46,126 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 03:01:56,251 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 03:02:01,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3661620.0, ans=0.125 2024-08-18 03:02:10,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3661720.0, ans=0.2 2024-08-18 03:02:18,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8550, loss[loss=0.1068, beats_loss=0.01223, ecapa_loss=0.0001299, whisper_loss=0.09326, over 22927.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001461, whisper_loss=0.09106, over 3876147.03 frames. ], batch size: 92, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:02:22,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3661820.0, ans=0.0 2024-08-18 03:02:40,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.327e+01 2.493e+01 2.871e+01 1.525e+02, threshold=4.987e+01, percent-clipped=2.0 2024-08-18 03:02:52,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3662020.0, ans=0.125 2024-08-18 03:02:53,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3662020.0, ans=0.1 2024-08-18 03:03:04,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3662120.0, ans=0.125 2024-08-18 03:03:17,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.98 vs. limit=15.0 2024-08-18 03:03:20,355 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8600, loss[loss=0.1153, beats_loss=0.01026, ecapa_loss=0.0001389, whisper_loss=0.1037, over 22501.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01032, ecapa_loss=0.0001472, whisper_loss=0.09229, over 3908827.73 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:03:38,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=15.0 2024-08-18 03:03:56,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3662620.0, ans=0.1 2024-08-18 03:04:00,131 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 03:04:11,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.13 vs. limit=22.5 2024-08-18 03:04:22,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8650, loss[loss=0.1264, beats_loss=0.007998, ecapa_loss=0.0001586, whisper_loss=0.1168, over 22070.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0104, ecapa_loss=0.0001479, whisper_loss=0.09169, over 3923189.03 frames. ], batch size: 84, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:04:32,354 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 03:04:35,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3662920.0, ans=0.0 2024-08-18 03:04:44,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.998e+01 2.275e+01 2.600e+01 2.961e+01 1.282e+02, threshold=5.200e+01, percent-clipped=4.0 2024-08-18 03:04:48,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3663020.0, ans=0.0 2024-08-18 03:04:53,050 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 03:05:07,242 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-18 03:05:25,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8700, loss[loss=0.0752, beats_loss=0.01467, ecapa_loss=9.701e-05, whisper_loss=0.05956, over 20818.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001468, whisper_loss=0.09057, over 3934612.37 frames. ], batch size: 82, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:05:33,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3663320.0, ans=0.5 2024-08-18 03:05:34,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3663320.0, ans=0.0 2024-08-18 03:05:34,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3663320.0, ans=0.0 2024-08-18 03:05:48,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3663420.0, ans=0.0 2024-08-18 03:05:54,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3663520.0, ans=0.1 2024-08-18 03:06:02,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3663620.0, ans=0.0 2024-08-18 03:06:16,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3663720.0, ans=0.125 2024-08-18 03:06:20,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3663720.0, ans=0.125 2024-08-18 03:06:26,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=12.0 2024-08-18 03:06:28,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8750, loss[loss=0.1219, beats_loss=0.01113, ecapa_loss=0.0001207, whisper_loss=0.1096, over 16018.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001466, whisper_loss=0.09038, over 3926121.06 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:06:47,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3663920.0, ans=0.2 2024-08-18 03:06:50,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3663920.0, ans=0.1 2024-08-18 03:06:51,112 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.333e+01 2.575e+01 2.893e+01 4.359e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 03:07:02,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3664020.0, ans=0.025 2024-08-18 03:07:02,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3664020.0, ans=0.04949747468305833 2024-08-18 03:07:10,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3664120.0, ans=0.0 2024-08-18 03:07:11,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3664120.0, ans=0.0 2024-08-18 03:07:31,103 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8800, loss[loss=0.08937, beats_loss=0.0104, ecapa_loss=0.0001406, whisper_loss=0.07757, over 21160.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001464, whisper_loss=0.09063, over 3937820.87 frames. ], batch size: 86, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:07:38,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.05 vs. limit=22.5 2024-08-18 03:07:46,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3664420.0, ans=0.0 2024-08-18 03:07:49,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3664420.0, ans=0.125 2024-08-18 03:07:52,411 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 14 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 03:08:22,074 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 03:08:26,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3664720.0, ans=0.0 2024-08-18 03:08:30,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3664720.0, ans=0.125 2024-08-18 03:08:35,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8850, loss[loss=0.08694, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.07495, over 15117.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001451, whisper_loss=0.09037, over 3897503.06 frames. ], batch size: 59, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:08:42,019 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-18 03:08:55,947 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 36 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 03:08:58,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.273e+01 2.462e+01 2.757e+01 3.654e+01, threshold=4.925e+01, percent-clipped=0.0 2024-08-18 03:09:25,971 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 03:09:31,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3665220.0, ans=0.0 2024-08-18 03:09:38,322 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 31 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 03:09:40,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8900, loss[loss=0.09816, beats_loss=0.01117, ecapa_loss=0.0001229, whisper_loss=0.08575, over 22672.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.0911, over 3888649.38 frames. ], batch size: 89, lr: 2.45e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:09:42,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3665320.0, ans=0.0 2024-08-18 03:09:57,480 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 03:10:10,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3665520.0, ans=0.07 2024-08-18 03:10:27,819 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04924134910106659, model_norm_threshold=49.24854278564453 2024-08-18 03:10:27,983 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.092e+05, grad_sumsq=2.092e+05, orig_rms_sq=1.000e+00 2024-08-18 03:10:48,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 8950, loss[loss=0.08946, beats_loss=0.01097, ecapa_loss=0.0001525, whisper_loss=0.07697, over 15611.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001441, whisper_loss=0.08996, over 3875484.78 frames. ], batch size: 64, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:11:12,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.306e+01 2.588e+01 2.937e+01 1.000e+03, threshold=5.176e+01, percent-clipped=1.0 2024-08-18 03:11:27,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3666120.0, ans=0.125 2024-08-18 03:11:29,429 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 03:11:32,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666120.0, ans=0.1 2024-08-18 03:11:40,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3666220.0, ans=0.125 2024-08-18 03:11:54,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9000, loss[loss=0.1104, beats_loss=0.01004, ecapa_loss=0.0001649, whisper_loss=0.09867, over 18199.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001456, whisper_loss=0.09016, over 3882916.88 frames. ], batch size: 72, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:11:54,328 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 03:12:27,817 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on ASR_libri: loss=0.254, beats_loss=0, ecapa_loss=0.0005275, whisper_loss=0.2487, over 922467.00 frames. 2024-08-18 03:12:44,037 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on SV_voxceleb1: loss=0.004142, beats_loss=0, ecapa_loss=0.0004142, whisper_loss=0, over 939242.00 frames. 2024-08-18 03:13:19,980 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0008, 0.0480, 0.0010, 0.0285, 0.0012, 0.0836, 0.0237, 0.0433], device='cuda:3') 2024-08-18 03:14:12,717 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.3700, 1.4400, 1.5231, 1.2020, 1.2426, 1.5464, 1.5978, 1.2158], device='cuda:3') 2024-08-18 03:14:18,529 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on AT_audioset: loss=0.02319, beats_loss=0.02319, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 03:14:18,532 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 03:14:21,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=3666320.0, ans=0.1 2024-08-18 03:14:48,841 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.356e-01 2024-08-18 03:14:50,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-18 03:15:01,423 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 03:15:08,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3666620.0, ans=0.1 2024-08-18 03:15:14,526 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 03:15:27,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3666720.0, ans=0.0 2024-08-18 03:15:31,295 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 03:15:32,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9050, loss[loss=0.08267, beats_loss=0.01052, ecapa_loss=0.0001215, whisper_loss=0.07093, over 16127.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001473, whisper_loss=0.09029, over 3876382.33 frames. ], batch size: 61, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:15:33,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3666820.0, ans=0.125 2024-08-18 03:15:48,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3666920.0, ans=0.125 2024-08-18 03:15:51,232 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=15.0 2024-08-18 03:15:59,334 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.276e+01 2.505e+01 2.759e+01 3.701e+01, threshold=5.009e+01, percent-clipped=0.0 2024-08-18 03:16:09,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3667020.0, ans=0.1 2024-08-18 03:16:37,968 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 33 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 03:16:44,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9100, loss[loss=0.1134, beats_loss=0.008245, ecapa_loss=0.0001427, whisper_loss=0.1037, over 18156.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001475, whisper_loss=0.09052, over 3901078.70 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:16:45,975 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 03:16:47,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3667320.0, ans=0.2 2024-08-18 03:16:51,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3667320.0, ans=0.125 2024-08-18 03:17:18,585 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:17:37,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3667620.0, ans=0.0 2024-08-18 03:17:44,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.28 vs. limit=12.0 2024-08-18 03:17:45,173 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 03:17:48,294 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-18 03:17:50,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3667720.0, ans=0.2 2024-08-18 03:17:51,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3667720.0, ans=0.125 2024-08-18 03:17:54,345 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 03:17:56,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9150, loss[loss=0.1222, beats_loss=0.009485, ecapa_loss=0.0001484, whisper_loss=0.1113, over 22743.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001461, whisper_loss=0.0914, over 3909153.97 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:18:13,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3667920.0, ans=0.1 2024-08-18 03:18:14,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3667920.0, ans=0.1 2024-08-18 03:18:22,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.335e+01 2.594e+01 2.955e+01 5.294e+01, threshold=5.187e+01, percent-clipped=1.0 2024-08-18 03:18:34,810 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:18:45,965 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 03:18:51,057 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 03:19:01,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3668220.0, ans=0.125 2024-08-18 03:19:03,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2024-08-18 03:19:09,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9200, loss[loss=0.09975, beats_loss=0.01105, ecapa_loss=0.0001132, whisper_loss=0.08757, over 23368.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001469, whisper_loss=0.0916, over 3932368.21 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:19:10,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-18 03:19:24,723 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 03:19:33,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3668420.0, ans=0.0 2024-08-18 03:19:41,421 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-18 03:19:44,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3668520.0, ans=0.125 2024-08-18 03:19:55,214 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 03:19:58,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3668620.0, ans=0.09899494936611666 2024-08-18 03:20:21,440 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 03:20:23,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9250, loss[loss=0.09802, beats_loss=0.01263, ecapa_loss=0.0001395, whisper_loss=0.084, over 18248.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.0001466, whisper_loss=0.09147, over 3929393.92 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:20:32,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3668820.0, ans=0.0 2024-08-18 03:20:51,777 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.259e+01 2.533e+01 2.843e+01 4.399e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 03:20:59,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3669020.0, ans=0.125 2024-08-18 03:21:00,519 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 03:21:04,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3669020.0, ans=0.0 2024-08-18 03:21:33,945 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 03:21:36,579 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 03:21:40,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9300, loss[loss=0.08884, beats_loss=0.0125, ecapa_loss=0.0001574, whisper_loss=0.07477, over 21817.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01053, ecapa_loss=0.0001464, whisper_loss=0.09141, over 3949644.47 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:21:56,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3669420.0, ans=0.035 2024-08-18 03:22:08,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3669420.0, ans=0.125 2024-08-18 03:22:18,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2024-08-18 03:22:21,028 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 03:22:27,765 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 03:22:36,824 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 03:22:47,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3669720.0, ans=0.0 2024-08-18 03:22:48,247 WARNING [optim.py:496] (3/4) Scaling gradients by 0.053159911185503006, model_norm_threshold=50.65474319458008 2024-08-18 03:22:48,411 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.966e+05, grad_sumsq=1.966e+05, orig_rms_sq=1.000e+00 2024-08-18 03:22:56,863 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9350, loss[loss=0.08634, beats_loss=0.01264, ecapa_loss=0.0001501, whisper_loss=0.0722, over 19014.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001475, whisper_loss=0.09091, over 3912132.86 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:23:21,392 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 03:23:22,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3669920.0, ans=0.125 2024-08-18 03:23:24,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.389e+01 2.563e+01 2.950e+01 9.529e+02, threshold=5.125e+01, percent-clipped=2.0 2024-08-18 03:23:30,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3670020.0, ans=0.0 2024-08-18 03:23:40,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3670020.0, ans=0.0 2024-08-18 03:23:41,540 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 03:23:48,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3670120.0, ans=0.1 2024-08-18 03:23:59,274 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 03:24:08,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=3670220.0, ans=0.125 2024-08-18 03:24:11,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9400, loss[loss=0.1129, beats_loss=0.01025, ecapa_loss=0.0001429, whisper_loss=0.1012, over 17625.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01057, ecapa_loss=0.0001466, whisper_loss=0.09034, over 3911657.90 frames. ], batch size: 70, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:24:25,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3670420.0, ans=0.0 2024-08-18 03:24:46,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3670520.0, ans=0.125 2024-08-18 03:25:18,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3670720.0, ans=0.0 2024-08-18 03:25:26,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9450, loss[loss=0.08276, beats_loss=0.01212, ecapa_loss=0.000159, whisper_loss=0.06905, over 18597.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001471, whisper_loss=0.08994, over 3886326.89 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:25:35,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3670820.0, ans=0.0 2024-08-18 03:25:51,134 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-18 03:25:55,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.676e+01 2.213e+01 2.450e+01 2.826e+01 4.775e+01, threshold=4.900e+01, percent-clipped=0.0 2024-08-18 03:26:09,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3671020.0, ans=0.1 2024-08-18 03:26:09,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.72 vs. limit=6.0 2024-08-18 03:26:21,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3671120.0, ans=0.0 2024-08-18 03:26:41,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3671220.0, ans=0.0 2024-08-18 03:26:45,630 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9500, loss[loss=0.116, beats_loss=0.009661, ecapa_loss=0.0001088, whisper_loss=0.1053, over 19774.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001459, whisper_loss=0.08949, over 3908205.67 frames. ], batch size: 72, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:26:54,523 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 03:27:08,641 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 03:27:09,774 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-18 03:27:30,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3671620.0, ans=0.1 2024-08-18 03:27:37,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3671620.0, ans=0.0 2024-08-18 03:27:43,797 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 03:27:53,376 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 03:27:55,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3671720.0, ans=0.2 2024-08-18 03:27:59,175 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9550, loss[loss=0.1063, beats_loss=0.009661, ecapa_loss=0.0001348, whisper_loss=0.09533, over 20450.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01058, ecapa_loss=0.0001473, whisper_loss=0.08932, over 3888749.60 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:28:25,960 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.340e+01 2.613e+01 3.002e+01 5.321e+01, threshold=5.225e+01, percent-clipped=2.0 2024-08-18 03:28:26,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3671920.0, ans=0.2 2024-08-18 03:28:50,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3672120.0, ans=0.125 2024-08-18 03:29:10,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3672320.0, ans=0.0 2024-08-18 03:29:11,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9600, loss[loss=0.08313, beats_loss=0.01146, ecapa_loss=0.0001369, whisper_loss=0.0703, over 18646.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01058, ecapa_loss=0.0001475, whisper_loss=0.08866, over 3872452.34 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:29:16,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.58 vs. limit=10.0 2024-08-18 03:29:38,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.47 vs. limit=15.0 2024-08-18 03:29:47,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=3672520.0, ans=15.0 2024-08-18 03:29:59,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3672620.0, ans=0.2 2024-08-18 03:30:08,322 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 03:30:12,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-18 03:30:12,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=8.0 2024-08-18 03:30:15,615 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 03:30:18,064 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-18 03:30:24,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9650, loss[loss=0.1001, beats_loss=0.009738, ecapa_loss=0.000187, whisper_loss=0.08847, over 20477.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01057, ecapa_loss=0.0001475, whisper_loss=0.08825, over 3810451.69 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:30:36,743 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 03:30:48,394 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 03:30:50,765 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.281e+01 2.485e+01 2.819e+01 4.590e+01, threshold=4.970e+01, percent-clipped=0.0 2024-08-18 03:31:05,158 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 03:31:19,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.12 vs. limit=15.0 2024-08-18 03:31:34,307 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 03:31:35,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3673220.0, ans=0.0 2024-08-18 03:31:46,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9700, loss[loss=0.07509, beats_loss=0.01167, ecapa_loss=0.0001373, whisper_loss=0.06204, over 20539.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01058, ecapa_loss=0.0001467, whisper_loss=0.08838, over 3830353.15 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:31:58,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3673320.0, ans=0.125 2024-08-18 03:32:05,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3673420.0, ans=0.07 2024-08-18 03:32:22,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3673520.0, ans=0.125 2024-08-18 03:32:29,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2024-08-18 03:32:33,718 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 03:32:36,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3673520.0, ans=0.0 2024-08-18 03:33:12,933 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9750, loss[loss=0.1182, beats_loss=0.01046, ecapa_loss=0.0001541, whisper_loss=0.1062, over 20350.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01063, ecapa_loss=0.0001458, whisper_loss=0.08827, over 3822543.27 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:33:30,586 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-18 03:33:41,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.12 vs. limit=15.0 2024-08-18 03:33:47,830 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.256e+01 2.570e+01 2.973e+01 3.927e+01, threshold=5.141e+01, percent-clipped=0.0 2024-08-18 03:33:52,036 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 10 from Vox, 29 fro AS 2024-08-18 03:34:01,120 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 03:34:03,445 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 03:34:07,205 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 03:34:08,945 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 03:34:09,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3674020.0, ans=0.1 2024-08-18 03:34:10,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=22.5 2024-08-18 03:34:14,801 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 03:34:19,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3674120.0, ans=0.1 2024-08-18 03:34:20,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3674120.0, ans=0.0 2024-08-18 03:34:20,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3674120.0, ans=0.2 2024-08-18 03:34:39,522 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 03:34:45,589 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 03:34:55,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9800, loss[loss=0.1049, beats_loss=0.009641, ecapa_loss=0.0001444, whisper_loss=0.09384, over 15984.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01066, ecapa_loss=0.0001459, whisper_loss=0.0883, over 3847155.78 frames. ], batch size: 60, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:34:58,279 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 03:35:03,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3674320.0, ans=0.0 2024-08-18 03:35:15,267 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 03:35:40,640 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09654395282268524, model_norm_threshold=51.40534973144531 2024-08-18 03:35:40,801 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.176e+04, grad_sumsq=6.832e+03, orig_rms_sq=9.039e+00 2024-08-18 03:35:58,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3674620.0, ans=0.09899494936611666 2024-08-18 03:36:04,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3674620.0, ans=0.0 2024-08-18 03:36:19,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3674720.0, ans=0.1 2024-08-18 03:36:27,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3674720.0, ans=0.0 2024-08-18 03:36:31,762 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-18 03:36:36,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9850, loss[loss=0.1194, beats_loss=0.009016, ecapa_loss=0.0001003, whisper_loss=0.1093, over 17733.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001443, whisper_loss=0.0893, over 3848084.47 frames. ], batch size: 64, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:36:49,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3674820.0, ans=0.125 2024-08-18 03:37:12,183 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.310e+01 2.559e+01 2.794e+01 5.325e+02, threshold=5.118e+01, percent-clipped=2.0 2024-08-18 03:37:13,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2024-08-18 03:37:15,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3675020.0, ans=0.2 2024-08-18 03:37:33,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3675020.0, ans=0.125 2024-08-18 03:37:51,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3675120.0, ans=0.025 2024-08-18 03:37:58,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-08-18 03:38:21,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9900, loss[loss=0.08076, beats_loss=0.01272, ecapa_loss=0.000149, whisper_loss=0.06656, over 19652.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01066, ecapa_loss=0.0001453, whisper_loss=0.08932, over 3859514.78 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:38:37,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3675420.0, ans=0.09899494936611666 2024-08-18 03:38:44,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.86 vs. limit=12.0 2024-08-18 03:38:59,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3675520.0, ans=0.2 2024-08-18 03:39:08,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3675620.0, ans=0.125 2024-08-18 03:39:12,252 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 26 from Vox, 11 fro AS 2024-08-18 03:39:12,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3675620.0, ans=0.2 2024-08-18 03:39:22,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.90 vs. limit=12.0 2024-08-18 03:39:25,134 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 03:39:25,444 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:39:34,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 9950, loss[loss=0.1213, beats_loss=0.01037, ecapa_loss=0.0001631, whisper_loss=0.1093, over 18679.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01063, ecapa_loss=0.0001461, whisper_loss=0.08956, over 3853811.35 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:39:35,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3675820.0, ans=0.2 2024-08-18 03:39:35,554 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-08-18 03:39:43,716 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 03:39:57,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3675920.0, ans=0.0 2024-08-18 03:40:00,871 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.255e+01 2.488e+01 2.824e+01 3.882e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-18 03:40:10,268 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 03:40:16,135 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-18 03:40:22,352 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-18 03:40:30,613 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 11 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 03:40:32,010 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 03:40:48,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10000, loss[loss=0.09528, beats_loss=0.00948, ecapa_loss=0.0001495, whisper_loss=0.0843, over 13714.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001476, whisper_loss=0.08958, over 3815483.43 frames. ], batch size: 54, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:40:50,336 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 03:40:53,028 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 03:41:03,638 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 03:41:12,456 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 03:41:24,120 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:41:47,383 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 03:41:50,466 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 03:41:50,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3676720.0, ans=0.125 2024-08-18 03:41:52,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-08-18 03:41:55,032 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 03:42:05,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10050, loss[loss=0.09883, beats_loss=0.01116, ecapa_loss=0.0001503, whisper_loss=0.08617, over 23416.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001472, whisper_loss=0.09035, over 3831545.37 frames. ], batch size: 95, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 03:42:17,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3676820.0, ans=0.05 2024-08-18 03:42:18,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3676920.0, ans=0.125 2024-08-18 03:42:31,483 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.309e+01 2.495e+01 2.743e+01 6.109e+01, threshold=4.990e+01, percent-clipped=1.0 2024-08-18 03:42:41,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3677020.0, ans=0.2 2024-08-18 03:42:49,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3677120.0, ans=0.125 2024-08-18 03:42:59,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3677120.0, ans=0.125 2024-08-18 03:43:07,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3677220.0, ans=0.125 2024-08-18 03:43:17,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10100, loss[loss=0.08898, beats_loss=0.0101, ecapa_loss=0.0001685, whisper_loss=0.0772, over 18840.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001467, whisper_loss=0.09043, over 3853056.30 frames. ], batch size: 78, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:43:23,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3677320.0, ans=0.1 2024-08-18 03:43:23,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3677320.0, ans=0.0 2024-08-18 03:43:32,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3677420.0, ans=0.0 2024-08-18 03:43:47,972 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-18 03:43:55,087 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 03:44:06,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=12.0 2024-08-18 03:44:23,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3677720.0, ans=0.1 2024-08-18 03:44:30,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3677720.0, ans=0.0 2024-08-18 03:44:34,270 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10150, loss[loss=0.1048, beats_loss=0.009809, ecapa_loss=0.0001541, whisper_loss=0.09345, over 20129.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001477, whisper_loss=0.09075, over 3876376.56 frames. ], batch size: 81, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:44:45,641 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 03:44:54,299 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 03:44:54,636 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:44:59,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.325e+01 2.524e+01 2.849e+01 6.626e+01, threshold=5.048e+01, percent-clipped=1.0 2024-08-18 03:45:00,481 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 03:45:02,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3678020.0, ans=0.125 2024-08-18 03:45:26,557 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 03:45:31,267 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 03:45:31,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.93 vs. limit=22.5 2024-08-18 03:45:38,958 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 03:45:43,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.19 vs. limit=15.0 2024-08-18 03:45:47,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10200, loss[loss=0.09054, beats_loss=0.011, ecapa_loss=0.0001145, whisper_loss=0.0784, over 17491.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.000146, whisper_loss=0.08998, over 3840265.51 frames. ], batch size: 65, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:45:53,502 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 03:46:08,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3678420.0, ans=0.125 2024-08-18 03:46:12,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3678420.0, ans=0.1 2024-08-18 03:46:22,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3678520.0, ans=0.2 2024-08-18 03:46:40,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3678620.0, ans=0.0 2024-08-18 03:46:54,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3678720.0, ans=0.2 2024-08-18 03:46:59,092 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-08-18 03:46:59,691 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-18 03:47:01,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10250, loss[loss=0.1066, beats_loss=0.01189, ecapa_loss=0.000112, whisper_loss=0.09362, over 23194.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01043, ecapa_loss=0.0001473, whisper_loss=0.08995, over 3842697.28 frames. ], batch size: 91, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:47:06,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3678820.0, ans=0.125 2024-08-18 03:47:08,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3678820.0, ans=0.125 2024-08-18 03:47:13,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3678920.0, ans=0.0 2024-08-18 03:47:27,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.63 vs. limit=15.0 2024-08-18 03:47:27,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.753e+01 2.259e+01 2.500e+01 2.782e+01 3.829e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-18 03:47:40,943 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 03:47:45,755 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=7.504e-02 2024-08-18 03:47:54,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3679120.0, ans=0.125 2024-08-18 03:47:56,756 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 03:48:15,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10300, loss[loss=0.1169, beats_loss=0.01053, ecapa_loss=0.0001377, whisper_loss=0.105, over 22316.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001472, whisper_loss=0.09054, over 3852874.79 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:48:23,373 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 03:48:44,428 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 03:48:46,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3679520.0, ans=0.125 2024-08-18 03:48:51,298 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 03:48:56,907 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 03:49:07,849 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 03:49:30,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10350, loss[loss=0.1264, beats_loss=0.009682, ecapa_loss=0.0001631, whisper_loss=0.1151, over 21426.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001466, whisper_loss=0.09106, over 3887835.61 frames. ], batch size: 85, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:49:37,797 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-18 03:49:38,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-08-18 03:49:39,355 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 03:49:53,969 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 03:49:59,399 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.361e+01 2.642e+01 2.935e+01 4.206e+01, threshold=5.285e+01, percent-clipped=0.0 2024-08-18 03:50:16,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3680120.0, ans=0.0 2024-08-18 03:50:24,299 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-18 03:50:48,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10400, loss[loss=0.118, beats_loss=0.01075, ecapa_loss=0.000124, whisper_loss=0.106, over 16493.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001457, whisper_loss=0.09066, over 3877370.01 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:50:51,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3680320.0, ans=0.2 2024-08-18 03:50:57,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3680320.0, ans=0.0 2024-08-18 03:50:58,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3680320.0, ans=0.07 2024-08-18 03:51:10,589 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 03:51:16,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3680520.0, ans=0.07 2024-08-18 03:51:30,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3680520.0, ans=0.2 2024-08-18 03:51:35,562 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 03:51:37,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3680620.0, ans=0.2 2024-08-18 03:51:39,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3680620.0, ans=0.0 2024-08-18 03:51:40,844 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 03:51:57,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3680720.0, ans=0.125 2024-08-18 03:52:02,457 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10450, loss[loss=0.1147, beats_loss=0.009631, ecapa_loss=0.000153, whisper_loss=0.1035, over 23255.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001457, whisper_loss=0.09067, over 3840350.04 frames. ], batch size: 92, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:52:13,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3680820.0, ans=0.1 2024-08-18 03:52:15,899 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 03:52:28,738 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.400e+01 2.629e+01 2.967e+01 1.519e+02, threshold=5.258e+01, percent-clipped=2.0 2024-08-18 03:52:29,164 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 03:52:38,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3681020.0, ans=0.0 2024-08-18 03:52:40,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.77 vs. limit=10.0 2024-08-18 03:52:51,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3681120.0, ans=0.125 2024-08-18 03:53:03,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3681220.0, ans=0.125 2024-08-18 03:53:04,399 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 03:53:08,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3681220.0, ans=0.05 2024-08-18 03:53:09,010 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 03:53:16,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10500, loss[loss=0.09573, beats_loss=0.01363, ecapa_loss=0.0001037, whisper_loss=0.08107, over 18680.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001453, whisper_loss=0.08985, over 3832508.61 frames. ], batch size: 71, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:53:16,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3681320.0, ans=0.1 2024-08-18 03:53:34,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-18 03:53:35,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3681420.0, ans=10.0 2024-08-18 03:53:42,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3681420.0, ans=0.125 2024-08-18 03:53:51,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3681520.0, ans=0.125 2024-08-18 03:53:51,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3681520.0, ans=0.125 2024-08-18 03:53:53,849 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 18 from Vox, 17 fro AS 2024-08-18 03:54:07,301 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 03:54:25,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3681720.0, ans=0.125 2024-08-18 03:54:31,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.55 vs. limit=10.0 2024-08-18 03:54:31,620 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10550, loss[loss=0.0909, beats_loss=0.01325, ecapa_loss=0.0001262, whisper_loss=0.07639, over 20023.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001454, whisper_loss=0.08973, over 3860712.72 frames. ], batch size: 83, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:54:57,720 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.276e+01 2.579e+01 2.977e+01 5.501e+01, threshold=5.159e+01, percent-clipped=1.0 2024-08-18 03:55:14,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3682020.0, ans=0.04949747468305833 2024-08-18 03:55:18,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3682120.0, ans=0.0 2024-08-18 03:55:31,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3682120.0, ans=0.125 2024-08-18 03:55:38,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-18 03:55:45,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3682220.0, ans=0.125 2024-08-18 03:55:51,061 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10600, loss[loss=0.09837, beats_loss=0.0102, ecapa_loss=0.0001365, whisper_loss=0.08681, over 21818.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001461, whisper_loss=0.08954, over 3899083.92 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:55:58,182 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 03:55:59,486 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 03:56:20,661 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 03:56:24,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3682520.0, ans=0.0 2024-08-18 03:56:57,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3682720.0, ans=0.125 2024-08-18 03:57:02,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3682720.0, ans=0.125 2024-08-18 03:57:07,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10650, loss[loss=0.09238, beats_loss=0.01267, ecapa_loss=0.0001383, whisper_loss=0.07833, over 20981.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001449, whisper_loss=0.08968, over 3875487.43 frames. ], batch size: 86, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:57:09,149 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 03:57:18,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3682820.0, ans=0.125 2024-08-18 03:57:21,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3682920.0, ans=0.0 2024-08-18 03:57:33,975 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.262e+01 2.551e+01 2.791e+01 4.688e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-18 03:57:49,505 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 03:57:51,059 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 03:57:58,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3683120.0, ans=0.125 2024-08-18 03:58:05,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3683120.0, ans=0.125 2024-08-18 03:58:19,272 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 03:58:23,105 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10700, loss[loss=0.0976, beats_loss=0.01075, ecapa_loss=0.0001313, whisper_loss=0.08553, over 20973.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001459, whisper_loss=0.09028, over 3866841.47 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:58:28,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3683320.0, ans=0.0 2024-08-18 03:58:39,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2024-08-18 03:58:54,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3683520.0, ans=10.0 2024-08-18 03:59:14,585 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 03:59:14,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3683620.0, ans=0.025 2024-08-18 03:59:19,908 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 03:59:21,869 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 30 from Vox, 25 fro AS 2024-08-18 03:59:31,493 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 03:59:33,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2024-08-18 03:59:38,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10750, loss[loss=0.09129, beats_loss=0.01017, ecapa_loss=0.0001974, whisper_loss=0.07915, over 21096.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001464, whisper_loss=0.09091, over 3884456.44 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 1.152921504606847e+18 2024-08-18 03:59:43,378 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 03:59:47,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2024-08-18 03:59:59,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3683920.0, ans=0.0 2024-08-18 04:00:03,920 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.722e+01 2.308e+01 2.481e+01 2.750e+01 3.346e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-18 04:00:10,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-18 04:00:13,066 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 29 from Vox, 27 fro AS 2024-08-18 04:00:31,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2024-08-18 04:00:47,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3684220.0, ans=0.125 2024-08-18 04:00:52,028 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 04:00:53,191 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10800, loss[loss=0.08243, beats_loss=0.01058, ecapa_loss=0.0001906, whisper_loss=0.06994, over 16982.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.0001472, whisper_loss=0.09093, over 3923111.39 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:00:56,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3684320.0, ans=0.0 2024-08-18 04:01:07,307 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 04:01:12,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-18 04:01:14,739 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 35 from Vox, 27 fro AS 2024-08-18 04:01:15,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.38 vs. limit=10.0 2024-08-18 04:01:23,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3684520.0, ans=0.125 2024-08-18 04:01:25,689 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 04:01:44,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3684620.0, ans=0.2 2024-08-18 04:01:46,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3684620.0, ans=0.125 2024-08-18 04:01:46,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3684620.0, ans=0.125 2024-08-18 04:01:51,099 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 04:02:06,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10850, loss[loss=0.109, beats_loss=0.01203, ecapa_loss=0.0001155, whisper_loss=0.09582, over 21502.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01048, ecapa_loss=0.0001475, whisper_loss=0.091, over 3917662.86 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:02:06,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3684820.0, ans=0.1 2024-08-18 04:02:08,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2024-08-18 04:02:09,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3684820.0, ans=0.1 2024-08-18 04:02:23,324 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 39 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 04:02:29,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3684920.0, ans=0.2 2024-08-18 04:02:34,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.394e+01 2.623e+01 3.020e+01 4.318e+02, threshold=5.247e+01, percent-clipped=1.0 2024-08-18 04:02:36,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3685020.0, ans=0.0 2024-08-18 04:02:42,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3685020.0, ans=0.125 2024-08-18 04:02:50,623 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 04:02:52,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2024-08-18 04:02:59,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-08-18 04:03:08,377 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 04:03:19,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10900, loss[loss=0.1194, beats_loss=0.008516, ecapa_loss=0.0001383, whisper_loss=0.1095, over 19537.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001473, whisper_loss=0.09161, over 3923056.68 frames. ], batch size: 75, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:03:48,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3685520.0, ans=0.125 2024-08-18 04:03:48,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2024-08-18 04:03:58,551 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.00 vs. limit=22.5 2024-08-18 04:04:08,946 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 04:04:31,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 10950, loss[loss=0.1192, beats_loss=0.009574, ecapa_loss=0.000161, whisper_loss=0.108, over 20746.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001476, whisper_loss=0.09124, over 3889952.81 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:04:37,322 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 04:04:38,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3685820.0, ans=0.0 2024-08-18 04:04:58,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.317e+01 2.614e+01 2.946e+01 3.732e+01, threshold=5.227e+01, percent-clipped=0.0 2024-08-18 04:05:00,351 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 04:05:04,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3686020.0, ans=0.1 2024-08-18 04:05:06,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3686020.0, ans=0.125 2024-08-18 04:05:07,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3686020.0, ans=10.0 2024-08-18 04:05:16,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3686120.0, ans=0.2 2024-08-18 04:05:22,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3686120.0, ans=0.125 2024-08-18 04:05:31,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2024-08-18 04:05:43,811 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11000, loss[loss=0.1054, beats_loss=0.01152, ecapa_loss=0.0001335, whisper_loss=0.09255, over 19490.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.0001478, whisper_loss=0.09109, over 3859800.14 frames. ], batch size: 77, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:05:43,993 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 12 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 04:05:53,121 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 04:05:54,903 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 04:06:12,085 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 04:06:19,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3686520.0, ans=15.0 2024-08-18 04:06:33,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.55 vs. limit=10.0 2024-08-18 04:06:43,429 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 04:06:50,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2024-08-18 04:06:53,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3686720.0, ans=0.1 2024-08-18 04:06:53,709 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2024-08-18 04:06:54,157 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 04:06:58,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3686820.0, ans=0.125 2024-08-18 04:06:59,326 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11050, loss[loss=0.09199, beats_loss=0.01323, ecapa_loss=0.0001088, whisper_loss=0.07767, over 16779.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.000147, whisper_loss=0.09037, over 3856997.03 frames. ], batch size: 66, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:07:25,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.316e+01 2.430e+01 2.735e+01 3.688e+01, threshold=4.860e+01, percent-clipped=0.0 2024-08-18 04:07:27,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687020.0, ans=0.1 2024-08-18 04:07:27,568 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=5.264e-03 2024-08-18 04:07:49,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3687120.0, ans=0.1 2024-08-18 04:07:50,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3687120.0, ans=0.125 2024-08-18 04:07:59,231 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 04:08:03,170 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 04:08:06,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3687220.0, ans=0.125 2024-08-18 04:08:07,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3687320.0, ans=0.125 2024-08-18 04:08:08,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11100, loss[loss=0.09181, beats_loss=0.01165, ecapa_loss=0.0001545, whisper_loss=0.07861, over 23209.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001474, whisper_loss=0.08973, over 3848156.77 frames. ], batch size: 95, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:08:15,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3687320.0, ans=0.1 2024-08-18 04:08:24,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3687420.0, ans=0.125 2024-08-18 04:08:34,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.34 vs. limit=22.5 2024-08-18 04:08:39,298 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 04:08:49,461 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 04:09:11,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3687720.0, ans=0.0 2024-08-18 04:09:13,194 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2024-08-18 04:09:20,750 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 04:09:24,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11150, loss[loss=0.115, beats_loss=0.01107, ecapa_loss=0.0001325, whisper_loss=0.1026, over 22252.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001468, whisper_loss=0.09042, over 3864129.70 frames. ], batch size: 89, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:09:48,651 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 04:09:53,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.407e+01 2.639e+01 3.028e+01 3.278e+02, threshold=5.278e+01, percent-clipped=1.0 2024-08-18 04:10:06,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3688020.0, ans=0.2 2024-08-18 04:10:13,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3688120.0, ans=0.0 2024-08-18 04:10:29,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3688220.0, ans=0.2 2024-08-18 04:10:39,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688220.0, ans=0.1 2024-08-18 04:10:42,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3688320.0, ans=0.1 2024-08-18 04:10:43,177 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11200, loss[loss=0.1071, beats_loss=0.01152, ecapa_loss=7.619e-05, whisper_loss=0.09481, over 17686.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001456, whisper_loss=0.09086, over 3866252.36 frames. ], batch size: 62, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:10:59,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3688420.0, ans=0.0 2024-08-18 04:10:59,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3688420.0, ans=0.0 2024-08-18 04:11:06,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-18 04:11:59,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11250, loss[loss=0.1068, beats_loss=0.01054, ecapa_loss=0.0001459, whisper_loss=0.09477, over 17290.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01057, ecapa_loss=0.0001464, whisper_loss=0.09116, over 3893948.08 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:12:06,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3688820.0, ans=0.125 2024-08-18 04:12:23,857 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 04:12:27,970 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.315e+01 2.575e+01 2.987e+01 7.220e+01, threshold=5.150e+01, percent-clipped=1.0 2024-08-18 04:12:29,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.95 vs. limit=15.0 2024-08-18 04:12:34,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3689020.0, ans=0.07 2024-08-18 04:12:38,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3689020.0, ans=0.125 2024-08-18 04:12:58,492 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 04:13:12,433 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.203e+00 2024-08-18 04:13:14,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11300, loss[loss=0.1175, beats_loss=0.01049, ecapa_loss=0.0001722, whisper_loss=0.1053, over 21155.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01054, ecapa_loss=0.000145, whisper_loss=0.09086, over 3889355.40 frames. ], batch size: 88, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:13:35,212 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-18 04:13:37,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2024-08-18 04:13:50,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689520.0, ans=0.125 2024-08-18 04:13:58,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3689520.0, ans=0.0 2024-08-18 04:14:01,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3689620.0, ans=0.125 2024-08-18 04:14:09,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3689620.0, ans=0.09899494936611666 2024-08-18 04:14:33,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11350, loss[loss=0.08887, beats_loss=0.01227, ecapa_loss=0.0001491, whisper_loss=0.0751, over 14171.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.09072, over 3888990.22 frames. ], batch size: 57, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:14:35,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3689820.0, ans=0.125 2024-08-18 04:14:39,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3689820.0, ans=0.125 2024-08-18 04:14:42,774 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.659e+00 2024-08-18 04:14:45,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-18 04:14:56,646 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 04:14:56,992 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:15:03,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.324e+01 2.496e+01 2.827e+01 4.121e+01, threshold=4.992e+01, percent-clipped=0.0 2024-08-18 04:15:19,413 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 04:15:34,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3690220.0, ans=0.0 2024-08-18 04:15:37,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3690220.0, ans=0.1 2024-08-18 04:15:49,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11400, loss[loss=0.09945, beats_loss=0.01168, ecapa_loss=0.000159, whisper_loss=0.08618, over 22913.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.09032, over 3849180.25 frames. ], batch size: 94, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:15:58,237 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 04:16:06,416 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-08-18 04:16:07,140 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 04:16:32,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3690520.0, ans=0.125 2024-08-18 04:16:33,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3690520.0, ans=0.125 2024-08-18 04:16:36,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3690520.0, ans=0.125 2024-08-18 04:16:41,638 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 04:16:45,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3690620.0, ans=0.125 2024-08-18 04:16:50,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-18 04:17:07,985 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11450, loss[loss=0.1147, beats_loss=0.0102, ecapa_loss=0.000129, whisper_loss=0.1032, over 20199.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.000147, whisper_loss=0.08993, over 3828099.68 frames. ], batch size: 76, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:17:13,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3690820.0, ans=0.1 2024-08-18 04:17:17,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3690820.0, ans=0.0 2024-08-18 04:17:25,632 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 04:17:32,233 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 04:17:37,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.265e+01 2.441e+01 2.752e+01 3.778e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 04:18:13,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3691220.0, ans=0.1 2024-08-18 04:18:26,243 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11500, loss[loss=0.1205, beats_loss=0.01014, ecapa_loss=0.0001527, whisper_loss=0.1088, over 18822.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001461, whisper_loss=0.09114, over 3865853.59 frames. ], batch size: 74, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:18:59,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=12.0 2024-08-18 04:19:08,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3691520.0, ans=0.2 2024-08-18 04:19:12,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-18 04:19:20,191 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-08-18 04:19:35,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3691720.0, ans=0.125 2024-08-18 04:19:42,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11550, loss[loss=0.1142, beats_loss=0.01105, ecapa_loss=0.0001486, whisper_loss=0.1016, over 22772.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001449, whisper_loss=0.0912, over 3863764.48 frames. ], batch size: 93, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:19:54,628 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 04:20:10,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3691920.0, ans=0.125 2024-08-18 04:20:12,647 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.438e+01 2.681e+01 2.984e+01 2.148e+02, threshold=5.363e+01, percent-clipped=1.0 2024-08-18 04:20:25,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3692020.0, ans=0.0 2024-08-18 04:20:26,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3692120.0, ans=0.0 2024-08-18 04:20:33,497 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 04:20:43,005 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2024-08-18 04:20:49,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3692220.0, ans=0.125 2024-08-18 04:20:56,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11600, loss[loss=0.07104, beats_loss=0.01325, ecapa_loss=0.0001457, whisper_loss=0.05634, over 19837.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001455, whisper_loss=0.09086, over 3860944.91 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:20:56,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3692320.0, ans=0.0 2024-08-18 04:21:13,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3692420.0, ans=0.0 2024-08-18 04:21:20,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3692420.0, ans=0.125 2024-08-18 04:21:25,943 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 04:21:27,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3692520.0, ans=0.0 2024-08-18 04:21:27,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3692520.0, ans=0.125 2024-08-18 04:21:28,629 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 04:21:37,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3692620.0, ans=0.125 2024-08-18 04:21:43,450 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-18 04:21:51,456 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 04:22:07,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3692720.0, ans=0.2 2024-08-18 04:22:07,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3692720.0, ans=0.125 2024-08-18 04:22:08,381 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 04:22:09,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11650, loss[loss=0.09571, beats_loss=0.01133, ecapa_loss=0.0001585, whisper_loss=0.0828, over 17398.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001443, whisper_loss=0.09071, over 3882888.33 frames. ], batch size: 73, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:22:37,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.291e+01 2.477e+01 2.731e+01 1.047e+02, threshold=4.954e+01, percent-clipped=2.0 2024-08-18 04:22:39,219 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 04:22:49,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3693020.0, ans=15.0 2024-08-18 04:22:51,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3693020.0, ans=0.0 2024-08-18 04:22:52,245 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 04:23:02,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3693120.0, ans=0.2 2024-08-18 04:23:13,059 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 04:23:19,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3693220.0, ans=0.125 2024-08-18 04:23:21,288 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11700, loss[loss=0.1375, beats_loss=0.007132, ecapa_loss=0.0001452, whisper_loss=0.1289, over 18072.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0107, ecapa_loss=0.0001443, whisper_loss=0.09033, over 3908706.13 frames. ], batch size: 67, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:23:26,823 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 04:23:27,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=3693320.0, ans=0.1 2024-08-18 04:23:29,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3693320.0, ans=0.1 2024-08-18 04:23:38,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3693420.0, ans=0.125 2024-08-18 04:24:18,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3693720.0, ans=0.07 2024-08-18 04:24:31,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3693720.0, ans=0.0 2024-08-18 04:24:33,969 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11750, loss[loss=0.1098, beats_loss=0.01186, ecapa_loss=0.0001315, whisper_loss=0.09659, over 22836.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001447, whisper_loss=0.0909, over 3897488.82 frames. ], batch size: 90, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:24:34,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3693820.0, ans=0.125 2024-08-18 04:24:40,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-08-18 04:24:53,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3693920.0, ans=0.95 2024-08-18 04:25:01,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.257e+01 2.580e+01 2.883e+01 7.198e+01, threshold=5.159e+01, percent-clipped=2.0 2024-08-18 04:25:02,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3694020.0, ans=0.125 2024-08-18 04:25:04,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3694020.0, ans=0.125 2024-08-18 04:25:17,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3694120.0, ans=0.0 2024-08-18 04:25:28,323 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 04:25:40,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2024-08-18 04:25:46,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3694320.0, ans=0.0 2024-08-18 04:25:47,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11800, loss[loss=0.08778, beats_loss=0.01427, ecapa_loss=0.0001342, whisper_loss=0.07216, over 18787.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01078, ecapa_loss=0.0001435, whisper_loss=0.09016, over 3895149.73 frames. ], batch size: 80, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:25:57,605 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 04:26:00,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.03 vs. limit=22.5 2024-08-18 04:26:13,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3694520.0, ans=0.125 2024-08-18 04:26:16,228 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 15 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-18 04:26:21,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2024-08-18 04:26:34,868 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 04:26:51,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2024-08-18 04:26:54,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11850, loss[loss=0.08054, beats_loss=0.01166, ecapa_loss=0.0001242, whisper_loss=0.06765, over 15427.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01078, ecapa_loss=0.0001434, whisper_loss=0.0897, over 3925466.63 frames. ], batch size: 59, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:26:54,731 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 04:26:58,227 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:27:04,297 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 34 from Vox, 39 fro AS 2024-08-18 04:27:12,032 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.655e+00 2024-08-18 04:27:14,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-08-18 04:27:19,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.338e+01 2.535e+01 2.911e+01 4.657e+01, threshold=5.070e+01, percent-clipped=0.0 2024-08-18 04:27:26,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=3695020.0, ans=0.05 2024-08-18 04:27:54,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3695220.0, ans=0.05 2024-08-18 04:28:01,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11900, loss[loss=0.1045, beats_loss=0.01238, ecapa_loss=0.0001598, whisper_loss=0.09048, over 20720.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001444, whisper_loss=0.09049, over 3914083.36 frames. ], batch size: 84, lr: 2.44e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:28:11,561 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 04:28:23,632 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 04:28:26,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3695420.0, ans=0.1 2024-08-18 04:28:39,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3695520.0, ans=0.2 2024-08-18 04:28:40,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3695620.0, ans=0.125 2024-08-18 04:28:51,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.69 vs. limit=15.0 2024-08-18 04:28:52,296 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 04:28:57,707 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 04:28:57,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3695720.0, ans=0.2 2024-08-18 04:29:07,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 11950, loss[loss=0.1002, beats_loss=0.009185, ecapa_loss=0.0001838, whisper_loss=0.08913, over 13328.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09079, over 3872041.18 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:29:14,483 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2024-08-18 04:29:22,797 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 04:29:31,521 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.580e+01 2.286e+01 2.533e+01 2.895e+01 4.370e+02, threshold=5.067e+01, percent-clipped=3.0 2024-08-18 04:29:39,259 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-18 04:29:39,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3696020.0, ans=0.125 2024-08-18 04:29:45,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3696120.0, ans=0.125 2024-08-18 04:29:58,057 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 04:29:59,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3696220.0, ans=0.0 2024-08-18 04:30:00,622 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 04:30:11,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3696320.0, ans=0.125 2024-08-18 04:30:12,563 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12000, loss[loss=0.08928, beats_loss=0.01191, ecapa_loss=0.0001632, whisper_loss=0.07573, over 20971.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001456, whisper_loss=0.09084, over 3873308.28 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:30:12,564 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 04:30:50,029 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005296, whisper_loss=0.2489, over 922467.00 frames. 2024-08-18 04:31:05,447 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on SV_voxceleb1: loss=0.004038, beats_loss=0, ecapa_loss=0.0004038, whisper_loss=0, over 939242.00 frames. 2024-08-18 04:31:35,683 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.7365, 2.0690, 1.7210, 1.4575, 1.6737, 1.4677, 1.8629, 1.7284], device='cuda:3') 2024-08-18 04:32:43,942 INFO [train_multi_KD3.py:1149] (3/4) Epoch 25, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 04:32:43,946 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 04:32:53,735 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 04:33:04,971 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 04:33:07,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3696420.0, ans=0.0 2024-08-18 04:33:07,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3696420.0, ans=0.125 2024-08-18 04:33:22,166 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 27 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 04:33:32,953 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09565281867980957, model_norm_threshold=50.66883087158203 2024-08-18 04:33:33,120 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.2.self_attn_weights.linear_pos.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.699e+04, grad_sumsq=4.144e+03, orig_rms_sq=8.927e+00 2024-08-18 04:33:36,941 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07986252754926682, model_norm_threshold=50.66883087158203 2024-08-18 04:33:37,106 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.743e+04, grad_sumsq=4.635e+06, orig_rms_sq=1.023e-02 2024-08-18 04:33:38,570 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 04:33:43,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3696720.0, ans=0.04949747468305833 2024-08-18 04:33:53,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12050, loss[loss=0.1238, beats_loss=0.009188, ecapa_loss=0.0001543, whisper_loss=0.1131, over 23131.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001455, whisper_loss=0.09012, over 3857667.41 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:34:06,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3696920.0, ans=0.125 2024-08-18 04:34:20,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.681e+01 2.233e+01 2.436e+01 2.807e+01 6.345e+02, threshold=4.872e+01, percent-clipped=3.0 2024-08-18 04:34:22,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2024-08-18 04:34:49,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3697220.0, ans=0.125 2024-08-18 04:34:50,917 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 04:35:02,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12100, loss[loss=0.1021, beats_loss=0.01141, ecapa_loss=0.0001296, whisper_loss=0.08943, over 19646.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001471, whisper_loss=0.09004, over 3844471.18 frames. ], batch size: 76, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:35:04,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3697320.0, ans=0.125 2024-08-18 04:35:24,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3697420.0, ans=10.0 2024-08-18 04:35:26,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3697420.0, ans=10.0 2024-08-18 04:35:41,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-08-18 04:35:45,824 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 04:35:47,299 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 04:35:58,712 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 04:36:08,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12150, loss[loss=0.09238, beats_loss=0.01047, ecapa_loss=0.0001452, whisper_loss=0.08047, over 13914.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.08983, over 3800900.34 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:36:32,325 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.238e+01 2.438e+01 2.753e+01 4.276e+01, threshold=4.877e+01, percent-clipped=0.0 2024-08-18 04:36:42,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3698020.0, ans=0.0 2024-08-18 04:36:45,022 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 04:36:52,193 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 04:36:57,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3698120.0, ans=0.125 2024-08-18 04:37:04,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3698220.0, ans=0.0 2024-08-18 04:37:05,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3698220.0, ans=0.125 2024-08-18 04:37:05,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3698220.0, ans=0.1 2024-08-18 04:37:06,522 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 04:37:12,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12200, loss[loss=0.1126, beats_loss=0.01084, ecapa_loss=0.000141, whisper_loss=0.1004, over 23007.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01055, ecapa_loss=0.0001463, whisper_loss=0.08945, over 3801713.68 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:37:13,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3698320.0, ans=0.05 2024-08-18 04:37:20,428 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 04:37:24,341 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 04:37:46,972 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 04:37:53,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3698620.0, ans=0.05 2024-08-18 04:37:59,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-08-18 04:38:03,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3698720.0, ans=0.2 2024-08-18 04:38:06,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3698720.0, ans=0.09899494936611666 2024-08-18 04:38:15,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12250, loss[loss=0.06639, beats_loss=0.008956, ecapa_loss=0.0001712, whisper_loss=0.05573, over 13350.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001466, whisper_loss=0.08997, over 3777353.85 frames. ], batch size: 54, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:38:26,234 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 04:38:40,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.471e+01 2.671e+01 3.119e+01 8.520e+01, threshold=5.341e+01, percent-clipped=2.0 2024-08-18 04:39:08,122 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 04:39:11,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3699220.0, ans=0.125 2024-08-18 04:39:19,480 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12300, loss[loss=0.09725, beats_loss=0.01056, ecapa_loss=0.0001526, whisper_loss=0.08517, over 22009.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001459, whisper_loss=0.09012, over 3798557.32 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:39:23,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3699320.0, ans=0.125 2024-08-18 04:39:31,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3699420.0, ans=0.0 2024-08-18 04:39:34,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3699420.0, ans=0.125 2024-08-18 04:39:44,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3699520.0, ans=0.0 2024-08-18 04:39:51,463 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 04:40:02,820 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 04:40:14,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3699720.0, ans=0.0 2024-08-18 04:40:21,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12350, loss[loss=0.08777, beats_loss=0.01216, ecapa_loss=9.946e-05, whisper_loss=0.07461, over 18890.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001464, whisper_loss=0.08997, over 3792186.73 frames. ], batch size: 72, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:40:22,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3699820.0, ans=0.125 2024-08-18 04:40:24,360 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 04:40:31,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3699820.0, ans=0.2 2024-08-18 04:40:39,878 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 04:40:45,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.044e+01 2.495e+01 2.683e+01 3.048e+01 4.110e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-18 04:40:56,775 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 04:40:59,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3700120.0, ans=0.1 2024-08-18 04:41:05,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3700120.0, ans=0.0 2024-08-18 04:41:07,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700120.0, ans=0.1 2024-08-18 04:41:10,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3700120.0, ans=0.125 2024-08-18 04:41:23,798 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 04:41:24,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12400, loss[loss=0.1108, beats_loss=0.01035, ecapa_loss=0.0001215, whisper_loss=0.09921, over 23256.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.08968, over 3823765.49 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:41:26,144 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 04:41:31,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3700320.0, ans=0.0 2024-08-18 04:41:34,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3700320.0, ans=0.125 2024-08-18 04:41:34,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.11 vs. limit=12.0 2024-08-18 04:41:48,596 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 04:41:52,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3700520.0, ans=0.1 2024-08-18 04:42:04,229 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 04:42:05,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3700620.0, ans=0.0 2024-08-18 04:42:09,758 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 04:42:10,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-18 04:42:16,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3700720.0, ans=0.125 2024-08-18 04:42:18,000 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 04:42:18,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3700720.0, ans=0.125 2024-08-18 04:42:20,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3700720.0, ans=0.0 2024-08-18 04:42:26,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12450, loss[loss=0.09363, beats_loss=0.01194, ecapa_loss=0.0001259, whisper_loss=0.08043, over 19754.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001452, whisper_loss=0.08899, over 3847531.67 frames. ], batch size: 78, lr: 2.43e-03, grad_scale: 5.764607523034235e+17 2024-08-18 04:42:32,166 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 04:42:40,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3700920.0, ans=0.125 2024-08-18 04:42:41,504 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 31 from Vox, 25 fro AS 2024-08-18 04:42:43,785 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 04:42:50,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.277e+01 2.499e+01 2.886e+01 6.809e+01, threshold=4.997e+01, percent-clipped=1.0 2024-08-18 04:42:57,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2024-08-18 04:43:04,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3701120.0, ans=0.5 2024-08-18 04:43:04,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3701120.0, ans=0.125 2024-08-18 04:43:20,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3701220.0, ans=0.1 2024-08-18 04:43:28,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12500, loss[loss=0.107, beats_loss=0.01063, ecapa_loss=0.0001443, whisper_loss=0.09495, over 22560.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001454, whisper_loss=0.08968, over 3836526.76 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:43:33,345 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 04:43:43,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3701420.0, ans=0.0 2024-08-18 04:44:01,651 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 04:44:04,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3701620.0, ans=0.125 2024-08-18 04:44:14,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3701620.0, ans=0.125 2024-08-18 04:44:15,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=3701620.0, ans=12.0 2024-08-18 04:44:30,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12550, loss[loss=0.1314, beats_loss=0.007097, ecapa_loss=0.0001591, whisper_loss=0.1227, over 16074.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001469, whisper_loss=0.09054, over 3874436.09 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:44:33,331 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 04:44:37,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3701820.0, ans=0.0 2024-08-18 04:44:55,885 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.343e+01 2.607e+01 2.934e+01 3.730e+01, threshold=5.215e+01, percent-clipped=0.0 2024-08-18 04:44:59,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3702020.0, ans=10.0 2024-08-18 04:45:08,839 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 04:45:29,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3702220.0, ans=0.125 2024-08-18 04:45:33,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12600, loss[loss=0.1151, beats_loss=0.009165, ecapa_loss=0.0001295, whisper_loss=0.1046, over 16919.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001465, whisper_loss=0.09085, over 3886447.83 frames. ], batch size: 66, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:45:46,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3702420.0, ans=0.07 2024-08-18 04:45:47,541 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 35 from Vox, 33 fro AS 2024-08-18 04:45:54,882 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 10 from Vox, 32 fro AS 2024-08-18 04:45:56,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3702420.0, ans=0.125 2024-08-18 04:45:57,898 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 04:45:59,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3702520.0, ans=0.1 2024-08-18 04:46:03,993 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 04:46:08,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3702520.0, ans=0.125 2024-08-18 04:46:09,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-08-18 04:46:12,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3702620.0, ans=0.0 2024-08-18 04:46:36,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12650, loss[loss=0.09064, beats_loss=0.01228, ecapa_loss=0.0001264, whisper_loss=0.07709, over 15927.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001466, whisper_loss=0.09028, over 3898600.90 frames. ], batch size: 63, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:46:38,730 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 04:46:45,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3702820.0, ans=0.2 2024-08-18 04:46:56,289 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 04:47:01,281 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.719e+01 2.352e+01 2.595e+01 2.962e+01 3.883e+01, threshold=5.190e+01, percent-clipped=0.0 2024-08-18 04:47:14,919 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 04:47:18,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.43 vs. limit=22.5 2024-08-18 04:47:18,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3703120.0, ans=0.1 2024-08-18 04:47:22,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-18 04:47:23,421 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 04:47:33,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3703220.0, ans=0.0 2024-08-18 04:47:38,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12700, loss[loss=0.1018, beats_loss=0.01104, ecapa_loss=0.000142, whisper_loss=0.08936, over 18785.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001468, whisper_loss=0.08999, over 3907894.72 frames. ], batch size: 73, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:47:42,315 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 04:47:46,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3703320.0, ans=0.125 2024-08-18 04:47:52,893 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.14 vs. limit=22.5 2024-08-18 04:47:56,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-18 04:47:57,398 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 04:48:15,895 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 04:48:17,057 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 04:48:18,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3703620.0, ans=0.0 2024-08-18 04:48:34,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3703720.0, ans=0.0 2024-08-18 04:48:34,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3703720.0, ans=0.0 2024-08-18 04:48:40,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12750, loss[loss=0.1122, beats_loss=0.01088, ecapa_loss=0.0001293, whisper_loss=0.1, over 22737.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01066, ecapa_loss=0.0001467, whisper_loss=0.08964, over 3898258.41 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:48:55,131 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 04:48:59,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3703920.0, ans=0.125 2024-08-18 04:49:05,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.358e+01 2.568e+01 2.897e+01 5.259e+01, threshold=5.137e+01, percent-clipped=1.0 2024-08-18 04:49:09,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3704020.0, ans=10.0 2024-08-18 04:49:13,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3704020.0, ans=0.1 2024-08-18 04:49:19,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3704120.0, ans=0.0 2024-08-18 04:49:29,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3704220.0, ans=0.125 2024-08-18 04:49:33,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3704220.0, ans=0.125 2024-08-18 04:49:34,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3704220.0, ans=0.2 2024-08-18 04:49:34,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3704220.0, ans=0.0 2024-08-18 04:49:36,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3704220.0, ans=0.125 2024-08-18 04:49:42,416 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12800, loss[loss=0.0815, beats_loss=0.01223, ecapa_loss=0.0001147, whisper_loss=0.06812, over 18716.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001466, whisper_loss=0.08994, over 3885295.46 frames. ], batch size: 73, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:49:49,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3704320.0, ans=0.125 2024-08-18 04:49:51,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3704320.0, ans=0.05 2024-08-18 04:50:03,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.39 vs. limit=15.0 2024-08-18 04:50:09,192 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 04:50:09,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3704520.0, ans=0.2 2024-08-18 04:50:19,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3704620.0, ans=0.0 2024-08-18 04:50:30,425 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 04:50:30,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3704620.0, ans=0.0 2024-08-18 04:50:31,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3704720.0, ans=0.125 2024-08-18 04:50:42,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3704720.0, ans=0.125 2024-08-18 04:50:45,706 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12850, loss[loss=0.103, beats_loss=0.01276, ecapa_loss=0.000136, whisper_loss=0.0889, over 20467.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01071, ecapa_loss=0.0001467, whisper_loss=0.08913, over 3848857.68 frames. ], batch size: 83, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:50:51,937 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 04:50:54,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3704820.0, ans=0.07 2024-08-18 04:50:58,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-08-18 04:51:02,066 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 04:51:10,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.342e+01 2.570e+01 2.903e+01 1.113e+02, threshold=5.140e+01, percent-clipped=1.0 2024-08-18 04:51:29,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3705120.0, ans=0.1 2024-08-18 04:51:41,005 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 04:51:42,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3705220.0, ans=0.025 2024-08-18 04:51:42,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-18 04:51:48,015 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12900, loss[loss=0.07375, beats_loss=0.0123, ecapa_loss=0.0001266, whisper_loss=0.06018, over 21457.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01066, ecapa_loss=0.0001466, whisper_loss=0.0891, over 3850282.19 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:51:49,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3705320.0, ans=0.07 2024-08-18 04:51:54,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3705320.0, ans=0.125 2024-08-18 04:52:29,910 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-18 04:52:42,752 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 04:52:47,515 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 04:52:49,886 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 12950, loss[loss=0.1134, beats_loss=0.01126, ecapa_loss=0.0001746, whisper_loss=0.1004, over 21911.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01068, ecapa_loss=0.0001458, whisper_loss=0.08892, over 3892537.76 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:53:15,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.225e+01 2.385e+01 2.707e+01 3.126e+02, threshold=4.771e+01, percent-clipped=1.0 2024-08-18 04:53:29,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3706120.0, ans=0.05 2024-08-18 04:53:32,688 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 04:53:35,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3706120.0, ans=0.0 2024-08-18 04:53:36,578 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 04:53:43,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3706220.0, ans=0.125 2024-08-18 04:53:52,846 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13000, loss[loss=0.09944, beats_loss=0.01118, ecapa_loss=0.0001551, whisper_loss=0.08671, over 21435.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001469, whisper_loss=0.08966, over 3885128.59 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:54:24,317 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 04:54:40,812 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 04:54:43,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3706720.0, ans=0.1 2024-08-18 04:54:49,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3706720.0, ans=0.0 2024-08-18 04:54:53,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3706720.0, ans=0.125 2024-08-18 04:54:55,449 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13050, loss[loss=0.1046, beats_loss=0.01014, ecapa_loss=0.0001407, whisper_loss=0.093, over 15451.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001459, whisper_loss=0.08957, over 3886780.37 frames. ], batch size: 61, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:55:08,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.44 vs. limit=22.5 2024-08-18 04:55:08,960 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-08-18 04:55:20,638 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.374e+01 2.590e+01 2.851e+01 4.425e+02, threshold=5.179e+01, percent-clipped=1.0 2024-08-18 04:55:28,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3707020.0, ans=0.07 2024-08-18 04:55:39,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2024-08-18 04:55:40,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3707120.0, ans=0.125 2024-08-18 04:55:44,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=3707220.0, ans=12.0 2024-08-18 04:55:48,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3707220.0, ans=0.2 2024-08-18 04:55:57,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13100, loss[loss=0.1113, beats_loss=0.008031, ecapa_loss=0.0001488, whisper_loss=0.1018, over 14395.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001464, whisper_loss=0.09031, over 3891387.47 frames. ], batch size: 56, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:55:59,119 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 04:56:05,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3707320.0, ans=0.2 2024-08-18 04:56:25,411 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-18 04:56:25,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3707520.0, ans=0.1 2024-08-18 04:56:31,397 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 04:56:46,971 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=2.273e-01 2024-08-18 04:56:59,803 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13150, loss[loss=0.09218, beats_loss=0.009986, ecapa_loss=0.0001649, whisper_loss=0.08055, over 22063.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.000146, whisper_loss=0.0903, over 3863166.55 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:57:10,379 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.155e+01 2024-08-18 04:57:10,461 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-08-18 04:57:16,466 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 04:57:25,548 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.230e+01 2.439e+01 2.719e+01 6.493e+01, threshold=4.878e+01, percent-clipped=1.0 2024-08-18 04:57:27,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3708020.0, ans=0.125 2024-08-18 04:57:33,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3708020.0, ans=0.0 2024-08-18 04:57:46,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3708120.0, ans=0.125 2024-08-18 04:57:51,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2024-08-18 04:58:02,832 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13200, loss[loss=0.105, beats_loss=0.01151, ecapa_loss=0.000142, whisper_loss=0.09204, over 20898.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001449, whisper_loss=0.09087, over 3872513.91 frames. ], batch size: 85, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:58:05,450 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 04:58:08,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3708320.0, ans=0.1 2024-08-18 04:58:23,257 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 04:58:25,608 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 04:58:33,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3708520.0, ans=0.125 2024-08-18 04:58:36,693 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 04:59:05,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13250, loss[loss=0.104, beats_loss=0.009376, ecapa_loss=0.000167, whisper_loss=0.09296, over 22367.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01034, ecapa_loss=0.000146, whisper_loss=0.09145, over 3842553.66 frames. ], batch size: 94, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 04:59:06,505 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 04:59:08,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3708820.0, ans=0.1 2024-08-18 04:59:09,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3708820.0, ans=0.2 2024-08-18 04:59:17,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3708920.0, ans=0.0 2024-08-18 04:59:30,032 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.594e+01 2.263e+01 2.511e+01 2.835e+01 4.406e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-18 04:59:44,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3709120.0, ans=0.1 2024-08-18 04:59:45,690 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2024-08-18 05:00:02,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3709220.0, ans=0.2 2024-08-18 05:00:07,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13300, loss[loss=0.08422, beats_loss=0.01355, ecapa_loss=0.0001328, whisper_loss=0.06934, over 15909.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01038, ecapa_loss=0.0001465, whisper_loss=0.09139, over 3850467.28 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:00:11,257 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 05:00:12,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3709320.0, ans=0.125 2024-08-18 05:00:16,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3709320.0, ans=0.0 2024-08-18 05:00:25,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3709420.0, ans=0.125 2024-08-18 05:00:37,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3709520.0, ans=0.1 2024-08-18 05:00:44,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-08-18 05:00:46,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3709620.0, ans=0.125 2024-08-18 05:00:57,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3709720.0, ans=0.125 2024-08-18 05:01:00,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-08-18 05:01:05,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3709720.0, ans=0.125 2024-08-18 05:01:09,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13350, loss[loss=0.0969, beats_loss=0.009354, ecapa_loss=0.000126, whisper_loss=0.08629, over 15854.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001459, whisper_loss=0.09118, over 3869340.59 frames. ], batch size: 61, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:01:19,851 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 05:01:26,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-18 05:01:34,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.554e+01 2.429e+01 2.621e+01 2.901e+01 4.949e+01, threshold=5.243e+01, percent-clipped=0.0 2024-08-18 05:01:43,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3710020.0, ans=0.125 2024-08-18 05:01:44,044 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 05:01:52,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3710120.0, ans=0.125 2024-08-18 05:01:59,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3710220.0, ans=0.125 2024-08-18 05:02:10,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.17 vs. limit=15.0 2024-08-18 05:02:13,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13400, loss[loss=0.1015, beats_loss=0.01001, ecapa_loss=0.0001379, whisper_loss=0.09013, over 16326.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01033, ecapa_loss=0.0001462, whisper_loss=0.09191, over 3866447.19 frames. ], batch size: 64, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:02:14,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3710320.0, ans=0.0 2024-08-18 05:02:28,411 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 05:02:42,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3710520.0, ans=0.125 2024-08-18 05:02:46,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2024-08-18 05:02:58,707 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 05:03:02,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.70 vs. limit=15.0 2024-08-18 05:03:16,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13450, loss[loss=0.09891, beats_loss=0.01024, ecapa_loss=0.0001299, whisper_loss=0.08737, over 18051.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01039, ecapa_loss=0.0001467, whisper_loss=0.0909, over 3868898.70 frames. ], batch size: 70, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:03:34,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3710920.0, ans=0.125 2024-08-18 05:03:35,739 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 05:03:38,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3710920.0, ans=0.125 2024-08-18 05:03:41,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.380e+01 2.555e+01 2.922e+01 3.832e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-18 05:03:47,931 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 05:03:59,418 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 05:04:01,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3711120.0, ans=0.125 2024-08-18 05:04:04,212 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 05:04:18,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13500, loss[loss=0.09512, beats_loss=0.0105, ecapa_loss=0.0001226, whisper_loss=0.0834, over 20615.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01034, ecapa_loss=0.0001466, whisper_loss=0.09179, over 3886476.27 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:04:22,441 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 05:04:38,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3711420.0, ans=0.0 2024-08-18 05:04:57,626 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 05:05:16,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=22.5 2024-08-18 05:05:19,770 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 05:05:20,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13550, loss[loss=0.1039, beats_loss=0.009686, ecapa_loss=0.0001779, whisper_loss=0.09246, over 19636.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.000148, whisper_loss=0.09139, over 3863461.05 frames. ], batch size: 81, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:05:21,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-18 05:05:27,322 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 05:05:28,564 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 05:05:32,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3711920.0, ans=0.0 2024-08-18 05:05:36,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3711920.0, ans=0.2 2024-08-18 05:05:36,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3711920.0, ans=0.0 2024-08-18 05:05:45,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.297e+01 2.515e+01 2.826e+01 4.852e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-18 05:05:49,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2024-08-18 05:06:23,024 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13600, loss[loss=0.1124, beats_loss=0.01035, ecapa_loss=0.00015, whisper_loss=0.1006, over 22858.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001472, whisper_loss=0.0909, over 3888717.94 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:06:23,172 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 05:06:23,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712320.0, ans=0.1 2024-08-18 05:06:29,223 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-18 05:06:30,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3712320.0, ans=0.1 2024-08-18 05:06:37,215 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 05:06:38,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3712420.0, ans=0.0 2024-08-18 05:06:40,989 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 05:06:41,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3712420.0, ans=0.2 2024-08-18 05:06:56,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3712520.0, ans=0.125 2024-08-18 05:06:59,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3712620.0, ans=0.125 2024-08-18 05:07:18,206 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-18 05:07:25,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13650, loss[loss=0.09409, beats_loss=0.01279, ecapa_loss=8.724e-05, whisper_loss=0.08042, over 16362.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001484, whisper_loss=0.09067, over 3887341.47 frames. ], batch size: 60, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:07:36,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3712820.0, ans=0.0 2024-08-18 05:07:37,531 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:07:38,356 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 05:07:50,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.696e+01 2.250e+01 2.521e+01 2.924e+01 4.330e+02, threshold=5.042e+01, percent-clipped=2.0 2024-08-18 05:07:50,898 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 05:07:51,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3713020.0, ans=0.07 2024-08-18 05:07:53,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3713020.0, ans=0.05 2024-08-18 05:08:22,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3713220.0, ans=0.2 2024-08-18 05:08:28,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13700, loss[loss=0.08759, beats_loss=0.01364, ecapa_loss=0.0001273, whisper_loss=0.07268, over 22603.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001467, whisper_loss=0.09041, over 3851914.00 frames. ], batch size: 93, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:08:30,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3713320.0, ans=0.0 2024-08-18 05:08:39,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3713320.0, ans=0.125 2024-08-18 05:08:50,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3713420.0, ans=0.2 2024-08-18 05:08:54,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3713520.0, ans=0.1 2024-08-18 05:09:05,785 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 05:09:08,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2024-08-18 05:09:30,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13750, loss[loss=0.1056, beats_loss=0.01048, ecapa_loss=0.0001384, whisper_loss=0.09379, over 18593.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01065, ecapa_loss=0.0001471, whisper_loss=0.09036, over 3856602.75 frames. ], batch size: 72, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:09:32,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3713820.0, ans=10.0 2024-08-18 05:09:54,371 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 05:09:55,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.369e+01 2.573e+01 3.048e+01 2.205e+02, threshold=5.146e+01, percent-clipped=4.0 2024-08-18 05:10:00,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2024-08-18 05:10:02,010 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0230120737105608, model_norm_threshold=51.46477508544922 2024-08-18 05:10:02,169 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.952e+05, grad_sumsq=7.749e+07, orig_rms_sq=1.026e-02 2024-08-18 05:10:07,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3714120.0, ans=0.2 2024-08-18 05:10:14,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3714120.0, ans=0.05 2024-08-18 05:10:23,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3714220.0, ans=0.125 2024-08-18 05:10:23,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-18 05:10:26,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3714220.0, ans=0.1 2024-08-18 05:10:27,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3714220.0, ans=0.125 2024-08-18 05:10:32,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13800, loss[loss=0.09917, beats_loss=0.008301, ecapa_loss=0.0001751, whisper_loss=0.08912, over 21045.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01064, ecapa_loss=0.0001465, whisper_loss=0.09024, over 3867406.92 frames. ], batch size: 84, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:10:45,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3714420.0, ans=0.2 2024-08-18 05:10:56,675 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 05:10:59,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3714520.0, ans=0.125 2024-08-18 05:11:13,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-08-18 05:11:15,364 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 18 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 05:11:19,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3714620.0, ans=0.05 2024-08-18 05:11:35,167 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13850, loss[loss=0.1082, beats_loss=0.01092, ecapa_loss=0.0001238, whisper_loss=0.09602, over 21648.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01067, ecapa_loss=0.0001457, whisper_loss=0.08983, over 3856880.25 frames. ], batch size: 83, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:11:40,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3714820.0, ans=0.125 2024-08-18 05:11:59,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3715020.0, ans=0.125 2024-08-18 05:11:59,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.280e+01 2.571e+01 2.873e+01 2.236e+03, threshold=5.141e+01, percent-clipped=2.0 2024-08-18 05:12:09,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.82 vs. limit=10.0 2024-08-18 05:12:12,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3715120.0, ans=0.125 2024-08-18 05:12:35,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3715220.0, ans=0.125 2024-08-18 05:12:37,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13900, loss[loss=0.09439, beats_loss=0.0128, ecapa_loss=0.0001252, whisper_loss=0.08034, over 20351.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01062, ecapa_loss=0.0001453, whisper_loss=0.09031, over 3854433.74 frames. ], batch size: 79, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:12:39,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3715320.0, ans=0.125 2024-08-18 05:12:51,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3715420.0, ans=0.125 2024-08-18 05:12:54,520 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-08-18 05:13:07,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3715520.0, ans=0.125 2024-08-18 05:13:15,061 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 05:13:39,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 13950, loss[loss=0.1025, beats_loss=0.0113, ecapa_loss=0.0001589, whisper_loss=0.08961, over 22337.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01061, ecapa_loss=0.0001454, whisper_loss=0.09096, over 3865415.23 frames. ], batch size: 92, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:13:42,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3715820.0, ans=0.125 2024-08-18 05:13:43,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3715820.0, ans=0.125 2024-08-18 05:13:49,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3715820.0, ans=0.125 2024-08-18 05:13:59,441 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 05:14:00,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2024-08-18 05:14:04,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.378e+01 2.637e+01 2.974e+01 4.505e+01, threshold=5.274e+01, percent-clipped=0.0 2024-08-18 05:14:23,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3716120.0, ans=0.125 2024-08-18 05:14:29,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3716220.0, ans=0.1 2024-08-18 05:14:35,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3716220.0, ans=0.1 2024-08-18 05:14:39,836 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=4.874e-02 2024-08-18 05:14:41,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14000, loss[loss=0.08219, beats_loss=0.01325, ecapa_loss=0.0001279, whisper_loss=0.06766, over 20027.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01059, ecapa_loss=0.0001449, whisper_loss=0.09103, over 3877478.31 frames. ], batch size: 82, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:14:49,188 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 05:15:00,490 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 05:15:05,316 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 05:15:12,916 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 05:15:32,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3716720.0, ans=0.1 2024-08-18 05:15:44,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14050, loss[loss=0.116, beats_loss=0.01107, ecapa_loss=0.0001628, whisper_loss=0.1033, over 22665.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0106, ecapa_loss=0.0001433, whisper_loss=0.09103, over 3878234.29 frames. ], batch size: 94, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:15:57,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3716920.0, ans=0.1 2024-08-18 05:16:07,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3716920.0, ans=0.125 2024-08-18 05:16:09,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.272e+01 2.582e+01 2.804e+01 4.484e+01, threshold=5.163e+01, percent-clipped=0.0 2024-08-18 05:16:16,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3717020.0, ans=0.125 2024-08-18 05:16:17,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3717020.0, ans=0.0 2024-08-18 05:16:20,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3717020.0, ans=0.0 2024-08-18 05:16:28,396 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 05:16:28,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3717120.0, ans=0.125 2024-08-18 05:16:47,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14100, loss[loss=0.1011, beats_loss=0.01289, ecapa_loss=0.0001234, whisper_loss=0.08695, over 17940.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01058, ecapa_loss=0.000144, whisper_loss=0.09078, over 3852713.15 frames. ], batch size: 70, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:16:53,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3717320.0, ans=0.2 2024-08-18 05:16:55,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3717320.0, ans=0.125 2024-08-18 05:16:57,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3717320.0, ans=0.0 2024-08-18 05:17:14,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3717520.0, ans=0.2 2024-08-18 05:17:21,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3717520.0, ans=0.125 2024-08-18 05:17:28,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3717620.0, ans=0.0 2024-08-18 05:17:30,800 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 05:17:35,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3717620.0, ans=0.125 2024-08-18 05:17:37,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3717720.0, ans=0.125 2024-08-18 05:17:43,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-08-18 05:17:49,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14150, loss[loss=0.1068, beats_loss=0.01004, ecapa_loss=0.0001475, whisper_loss=0.09529, over 14844.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.000144, whisper_loss=0.09026, over 3850081.80 frames. ], batch size: 57, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:17:52,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3717820.0, ans=0.125 2024-08-18 05:18:08,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3717920.0, ans=0.04949747468305833 2024-08-18 05:18:14,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.317e+01 2.583e+01 2.863e+01 6.029e+01, threshold=5.165e+01, percent-clipped=1.0 2024-08-18 05:18:15,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3718020.0, ans=0.0 2024-08-18 05:18:27,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3718120.0, ans=0.0 2024-08-18 05:18:38,358 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 05:18:51,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14200, loss[loss=0.1098, beats_loss=0.009141, ecapa_loss=0.0001305, whisper_loss=0.0994, over 23097.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01066, ecapa_loss=0.0001442, whisper_loss=0.08953, over 3853684.34 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:18:54,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3718320.0, ans=0.125 2024-08-18 05:19:10,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3718420.0, ans=0.125 2024-08-18 05:19:11,977 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=15.0 2024-08-18 05:19:14,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3718420.0, ans=0.125 2024-08-18 05:19:22,726 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 39 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 05:19:30,184 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 05:19:37,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3718620.0, ans=0.1 2024-08-18 05:19:44,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3718720.0, ans=0.125 2024-08-18 05:19:54,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14250, loss[loss=0.1123, beats_loss=0.01129, ecapa_loss=0.0001549, whisper_loss=0.0995, over 21759.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001435, whisper_loss=0.08996, over 3899150.99 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:20:00,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3718820.0, ans=0.125 2024-08-18 05:20:10,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3718920.0, ans=0.125 2024-08-18 05:20:11,713 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 05:20:16,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.48 vs. limit=15.0 2024-08-18 05:20:20,722 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.251e+01 2.558e+01 2.864e+01 4.072e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-18 05:20:29,097 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 05:20:30,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3719020.0, ans=0.0 2024-08-18 05:20:45,249 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 05:21:09,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3719320.0, ans=0.2 2024-08-18 05:21:10,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14300, loss[loss=0.09341, beats_loss=0.01057, ecapa_loss=0.0001539, whisper_loss=0.0813, over 15752.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001429, whisper_loss=0.08975, over 3864666.15 frames. ], batch size: 63, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:21:17,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3719320.0, ans=0.125 2024-08-18 05:21:28,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3719420.0, ans=0.125 2024-08-18 05:21:45,154 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 05:21:53,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3719520.0, ans=0.0 2024-08-18 05:21:58,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3719620.0, ans=0.125 2024-08-18 05:22:00,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3719620.0, ans=0.2 2024-08-18 05:22:22,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-18 05:22:28,109 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 05:22:40,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14350, loss[loss=0.0895, beats_loss=0.01077, ecapa_loss=0.0001349, whisper_loss=0.07738, over 17445.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001448, whisper_loss=0.09033, over 3893522.02 frames. ], batch size: 66, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:22:54,045 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 05:23:01,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-08-18 05:23:06,180 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-18 05:23:22,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.359e+01 2.540e+01 2.872e+01 4.791e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-18 05:23:23,939 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 05:23:50,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3720120.0, ans=0.2 2024-08-18 05:24:05,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.55 vs. limit=22.5 2024-08-18 05:24:12,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.44 vs. limit=10.0 2024-08-18 05:24:18,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-08-18 05:24:22,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14400, loss[loss=0.1129, beats_loss=0.01014, ecapa_loss=0.0001516, whisper_loss=0.1013, over 23156.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001454, whisper_loss=0.09007, over 3900279.31 frames. ], batch size: 94, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:24:27,535 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 05:24:33,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3720320.0, ans=0.0 2024-08-18 05:24:34,216 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 05:24:36,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3720320.0, ans=0.09899494936611666 2024-08-18 05:24:49,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2024-08-18 05:25:17,454 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 33 from Vox, 35 fro AS 2024-08-18 05:25:19,034 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=31.83 vs. limit=22.5 2024-08-18 05:25:50,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3720720.0, ans=0.125 2024-08-18 05:26:08,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 25, batch 14450, loss[loss=0.09218, beats_loss=0.01454, ecapa_loss=9.779e-05, whisper_loss=0.07666, over 16371.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001463, whisper_loss=0.09014, over 3908233.29 frames. ], batch size: 62, lr: 2.43e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 05:26:09,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3720820.0, ans=0.125 2024-08-18 05:26:48,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.314e+01 2.548e+01 2.908e+01 2.011e+02, threshold=5.097e+01, percent-clipped=3.0 2024-08-18 05:26:49,852 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2024-08-18 05:27:51,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3721230.0, ans=0.1 2024-08-18 05:27:51,742 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 0, loss[loss=0.08518, beats_loss=0.01188, ecapa_loss=0.0001488, whisper_loss=0.07181, over 16987.00 frames. ], tot_loss[loss=0.08518, beats_loss=0.01188, ecapa_loss=0.0001488, whisper_loss=0.07181, over 16987.00 frames. ], batch size: 72, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:27:51,742 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 05:28:25,252 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on ASR_libri: loss=0.251, beats_loss=0, ecapa_loss=0.0005273, whisper_loss=0.2457, over 922467.00 frames. 2024-08-18 05:28:39,623 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on SV_voxceleb1: loss=0.004107, beats_loss=0, ecapa_loss=0.0004107, whisper_loss=0, over 939242.00 frames. 2024-08-18 05:29:53,194 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6643, 2.3262, 2.5556, 2.3901], device='cuda:3') 2024-08-18 05:30:15,543 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on AT_audioset: loss=0.02319, beats_loss=0.02319, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 05:30:15,546 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 05:30:19,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3721230.0, ans=0.2 2024-08-18 05:30:24,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3721230.0, ans=0.0 2024-08-18 05:30:27,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3721230.0, ans=0.1 2024-08-18 05:30:42,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3721330.0, ans=0.1 2024-08-18 05:31:02,323 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 26 from Vox, 46 fro AS 2024-08-18 05:31:15,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=3721430.0, ans=0.02 2024-08-18 05:31:42,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3721530.0, ans=0.125 2024-08-18 05:31:58,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3721630.0, ans=0.125 2024-08-18 05:32:09,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 50, loss[loss=0.07586, beats_loss=0.0105, ecapa_loss=0.0001316, whisper_loss=0.06405, over 22785.00 frames. ], tot_loss[loss=0.09652, beats_loss=0.009834, ecapa_loss=0.0001484, whisper_loss=0.0852, over 899453.14 frames. ], batch size: 91, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:32:17,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2024-08-18 05:32:25,254 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 05:32:31,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3721830.0, ans=0.0 2024-08-18 05:32:48,248 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 05:33:07,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3721930.0, ans=0.1 2024-08-18 05:33:12,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.655e+01 2.458e+01 2.767e+01 3.037e+01 4.050e+01, threshold=5.534e+01, percent-clipped=0.0 2024-08-18 05:33:16,521 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-18 05:33:27,784 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 05:33:42,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.29 vs. limit=10.0 2024-08-18 05:33:52,444 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 05:33:54,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-18 05:33:58,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 100, loss[loss=0.1124, beats_loss=0.007142, ecapa_loss=0.0001334, whisper_loss=0.104, over 16148.00 frames. ], tot_loss[loss=0.09893, beats_loss=0.0096, ecapa_loss=0.0001454, whisper_loss=0.08788, over 1547029.50 frames. ], batch size: 59, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:34:03,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3722230.0, ans=0.0 2024-08-18 05:34:10,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3722230.0, ans=0.0 2024-08-18 05:34:12,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3722230.0, ans=0.125 2024-08-18 05:34:15,084 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 05:34:16,312 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 05:35:00,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.56 vs. limit=15.0 2024-08-18 05:35:22,342 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 05:35:35,842 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 150, loss[loss=0.1092, beats_loss=0.007141, ecapa_loss=0.0001326, whisper_loss=0.1007, over 14466.00 frames. ], tot_loss[loss=0.09934, beats_loss=0.009687, ecapa_loss=0.0001441, whisper_loss=0.08822, over 2030869.12 frames. ], batch size: 53, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:35:49,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3722730.0, ans=0.2 2024-08-18 05:35:57,703 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 05:36:17,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3722930.0, ans=0.125 2024-08-18 05:36:23,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.548e+01 2.773e+01 3.033e+01 4.359e+01, threshold=5.546e+01, percent-clipped=0.0 2024-08-18 05:36:55,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 200, loss[loss=0.1115, beats_loss=0.009364, ecapa_loss=0.0001501, whisper_loss=0.1006, over 19328.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.009802, ecapa_loss=0.0001461, whisper_loss=0.0898, over 2424696.26 frames. ], batch size: 77, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:36:58,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-18 05:36:59,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3723230.0, ans=0.125 2024-08-18 05:37:00,589 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.130e-01 2024-08-18 05:37:12,033 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 05:37:26,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3723430.0, ans=0.0 2024-08-18 05:37:26,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-18 05:37:39,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3723530.0, ans=0.2 2024-08-18 05:37:47,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3723530.0, ans=0.05 2024-08-18 05:38:09,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 250, loss[loss=0.1078, beats_loss=0.01179, ecapa_loss=0.000145, whisper_loss=0.0946, over 21954.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.00999, ecapa_loss=0.0001467, whisper_loss=0.0899, over 2704075.64 frames. ], batch size: 86, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:38:13,206 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.34 vs. limit=22.5 2024-08-18 05:38:24,587 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 05:38:28,371 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-08-18 05:38:30,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3723830.0, ans=0.125 2024-08-18 05:38:36,877 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 05:38:49,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.303e+01 2.551e+01 2.928e+01 5.127e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-18 05:39:03,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2024-08-18 05:39:14,536 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 05:39:19,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 300, loss[loss=0.1017, beats_loss=0.01263, ecapa_loss=0.0001347, whisper_loss=0.08772, over 23542.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01017, ecapa_loss=0.0001458, whisper_loss=0.0891, over 2945816.04 frames. ], batch size: 92, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:39:46,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3724430.0, ans=0.0 2024-08-18 05:39:51,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3724430.0, ans=0.09899494936611666 2024-08-18 05:39:55,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3724430.0, ans=0.125 2024-08-18 05:40:02,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3724530.0, ans=0.125 2024-08-18 05:40:13,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3724630.0, ans=0.0 2024-08-18 05:40:14,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3724630.0, ans=0.0 2024-08-18 05:40:29,026 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 350, loss[loss=0.09114, beats_loss=0.01196, ecapa_loss=0.0001265, whisper_loss=0.07791, over 20387.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01022, ecapa_loss=0.0001465, whisper_loss=0.08927, over 3171959.66 frames. ], batch size: 81, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:40:49,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3724830.0, ans=0.125 2024-08-18 05:41:06,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.645e+01 2.139e+01 2.480e+01 2.873e+01 3.431e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-18 05:41:09,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.63 vs. limit=15.0 2024-08-18 05:41:09,672 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 05:41:12,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3725030.0, ans=0.125 2024-08-18 05:41:23,906 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 05:41:33,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 400, loss[loss=0.09035, beats_loss=0.01229, ecapa_loss=0.0001787, whisper_loss=0.07628, over 21107.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01033, ecapa_loss=0.000145, whisper_loss=0.08922, over 3296917.58 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:41:35,376 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 05:41:36,556 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 05:41:36,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3725230.0, ans=0.1 2024-08-18 05:41:52,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3725330.0, ans=0.2 2024-08-18 05:41:59,079 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 05:42:03,562 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 05:42:07,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=3725430.0, ans=0.2 2024-08-18 05:42:10,141 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 05:42:12,450 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 05:42:21,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2024-08-18 05:42:29,791 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 05:42:39,646 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 450, loss[loss=0.1014, beats_loss=0.009289, ecapa_loss=0.0001695, whisper_loss=0.09037, over 22050.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01036, ecapa_loss=0.0001443, whisper_loss=0.08888, over 3413930.60 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:42:41,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.51 vs. limit=12.0 2024-08-18 05:42:51,317 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 05:43:08,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3725930.0, ans=0.125 2024-08-18 05:43:17,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.230e+01 2.487e+01 2.863e+01 4.267e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-18 05:43:18,781 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-18 05:43:19,112 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 05:43:31,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3726130.0, ans=0.0 2024-08-18 05:43:39,445 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.69 vs. limit=15.0 2024-08-18 05:43:45,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 500, loss[loss=0.1022, beats_loss=0.009134, ecapa_loss=0.0001387, whisper_loss=0.09168, over 16836.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.0104, ecapa_loss=0.0001447, whisper_loss=0.08861, over 3526278.05 frames. ], batch size: 63, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:43:55,066 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 05:43:58,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3726330.0, ans=0.1 2024-08-18 05:44:03,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3726330.0, ans=0.1 2024-08-18 05:44:15,996 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 05:44:37,212 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 05:44:50,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 550, loss[loss=0.1052, beats_loss=0.009001, ecapa_loss=0.0001195, whisper_loss=0.09501, over 17912.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0104, ecapa_loss=0.0001443, whisper_loss=0.08857, over 3585851.52 frames. ], batch size: 67, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:44:55,671 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 05:44:57,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3726730.0, ans=0.0 2024-08-18 05:45:01,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3726730.0, ans=0.125 2024-08-18 05:45:01,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3726730.0, ans=0.125 2024-08-18 05:45:25,762 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 05:45:28,248 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.311e+01 2.532e+01 2.757e+01 3.672e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 05:45:31,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3727030.0, ans=0.0 2024-08-18 05:45:48,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3727130.0, ans=0.0 2024-08-18 05:45:55,378 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 600, loss[loss=0.0796, beats_loss=0.009286, ecapa_loss=0.0001595, whisper_loss=0.06872, over 15746.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01035, ecapa_loss=0.0001442, whisper_loss=0.08881, over 3661019.38 frames. ], batch size: 60, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:45:55,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.23 vs. limit=10.0 2024-08-18 05:46:05,940 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 05:46:10,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3727330.0, ans=0.2 2024-08-18 05:46:19,389 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 05:46:46,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3727630.0, ans=0.0 2024-08-18 05:46:56,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3727630.0, ans=0.125 2024-08-18 05:47:00,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 650, loss[loss=0.1092, beats_loss=0.009083, ecapa_loss=0.0001621, whisper_loss=0.09854, over 16180.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01033, ecapa_loss=0.0001451, whisper_loss=0.08874, over 3681741.76 frames. ], batch size: 62, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:47:25,629 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 05:47:38,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.282e+01 2.594e+01 2.924e+01 5.519e+01, threshold=5.188e+01, percent-clipped=2.0 2024-08-18 05:47:40,764 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 26 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-18 05:48:05,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 700, loss[loss=0.09499, beats_loss=0.01037, ecapa_loss=0.0001293, whisper_loss=0.08333, over 18222.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01031, ecapa_loss=0.0001451, whisper_loss=0.08912, over 3704997.79 frames. ], batch size: 74, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:48:10,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3728230.0, ans=0.125 2024-08-18 05:48:14,825 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0922105461359024, model_norm_threshold=51.87888717651367 2024-08-18 05:48:14,989 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.030e+04, grad_sumsq=4.030e+04, orig_rms_sq=1.000e+00 2024-08-18 05:48:25,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3728330.0, ans=0.0 2024-08-18 05:48:27,720 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 05:48:33,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3728430.0, ans=0.125 2024-08-18 05:48:44,349 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 05:48:47,153 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 05:48:47,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3728530.0, ans=0.125 2024-08-18 05:48:47,984 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2024-08-18 05:49:00,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3728630.0, ans=0.2 2024-08-18 05:49:00,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3728630.0, ans=0.04949747468305833 2024-08-18 05:49:01,152 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 05:49:01,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3728630.0, ans=0.125 2024-08-18 05:49:04,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-18 05:49:10,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 750, loss[loss=0.09759, beats_loss=0.009611, ecapa_loss=0.0001226, whisper_loss=0.08676, over 21627.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01033, ecapa_loss=0.0001435, whisper_loss=0.08954, over 3759612.32 frames. ], batch size: 84, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:49:10,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3728730.0, ans=0.125 2024-08-18 05:49:13,602 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 05:49:15,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3728730.0, ans=0.125 2024-08-18 05:49:18,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3728730.0, ans=0.0 2024-08-18 05:49:24,976 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 05:49:40,131 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 05:49:47,397 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.292e+01 2.475e+01 2.827e+01 5.626e+02, threshold=4.950e+01, percent-clipped=2.0 2024-08-18 05:49:47,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3729030.0, ans=0.2 2024-08-18 05:50:06,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-18 05:50:08,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-08-18 05:50:11,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3729130.0, ans=0.0 2024-08-18 05:50:16,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 800, loss[loss=0.1032, beats_loss=0.01019, ecapa_loss=0.0001474, whisper_loss=0.09152, over 16211.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.0001423, whisper_loss=0.08887, over 3770180.61 frames. ], batch size: 65, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:50:23,364 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 05:50:39,638 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 05:50:49,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3729430.0, ans=0.2 2024-08-18 05:51:11,800 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 05:51:18,771 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-18 05:51:23,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 850, loss[loss=0.1081, beats_loss=0.01121, ecapa_loss=0.0001511, whisper_loss=0.09542, over 21188.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001436, whisper_loss=0.08884, over 3785017.37 frames. ], batch size: 85, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:51:33,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729730.0, ans=0.1 2024-08-18 05:51:39,807 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 05:51:51,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-18 05:51:54,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3729930.0, ans=0.1 2024-08-18 05:52:02,224 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.231e+01 2.471e+01 2.825e+01 3.854e+01, threshold=4.942e+01, percent-clipped=0.0 2024-08-18 05:52:15,570 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 05:52:18,249 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 05:52:18,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2024-08-18 05:52:28,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3730130.0, ans=0.125 2024-08-18 05:52:30,531 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 900, loss[loss=0.09254, beats_loss=0.01046, ecapa_loss=0.0001176, whisper_loss=0.08091, over 15692.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0104, ecapa_loss=0.000143, whisper_loss=0.08902, over 3793416.00 frames. ], batch size: 58, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:52:49,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3730330.0, ans=0.1 2024-08-18 05:53:01,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3730430.0, ans=0.125 2024-08-18 05:53:12,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3730530.0, ans=0.125 2024-08-18 05:53:14,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3730530.0, ans=0.2 2024-08-18 05:53:22,955 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 05:53:27,417 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 05:53:38,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 950, loss[loss=0.1042, beats_loss=0.009529, ecapa_loss=0.0001589, whisper_loss=0.0931, over 15247.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01044, ecapa_loss=0.0001428, whisper_loss=0.08832, over 3797272.25 frames. ], batch size: 61, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:53:39,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3730730.0, ans=0.025 2024-08-18 05:54:02,760 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 31 from Vox, 34 fro AS 2024-08-18 05:54:05,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3730930.0, ans=0.125 2024-08-18 05:54:12,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3730930.0, ans=0.0 2024-08-18 05:54:17,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.336e+01 2.576e+01 2.851e+01 4.260e+01, threshold=5.152e+01, percent-clipped=0.0 2024-08-18 05:54:21,968 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 05:54:24,454 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 05:54:27,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3731030.0, ans=0.025 2024-08-18 05:54:46,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1000, loss[loss=0.1006, beats_loss=0.009365, ecapa_loss=0.0001513, whisper_loss=0.08973, over 17680.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01041, ecapa_loss=0.0001429, whisper_loss=0.08854, over 3793961.69 frames. ], batch size: 70, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:54:49,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3731230.0, ans=0.0 2024-08-18 05:54:50,852 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 05:54:55,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3731230.0, ans=0.1 2024-08-18 05:54:55,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3731230.0, ans=0.125 2024-08-18 05:54:56,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3731230.0, ans=0.0 2024-08-18 05:55:00,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3731330.0, ans=0.0 2024-08-18 05:55:07,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3731330.0, ans=0.0 2024-08-18 05:55:38,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3731530.0, ans=0.0 2024-08-18 05:55:43,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3731630.0, ans=0.2 2024-08-18 05:55:53,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1050, loss[loss=0.1043, beats_loss=0.009818, ecapa_loss=0.0001615, whisper_loss=0.09288, over 15634.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001431, whisper_loss=0.08927, over 3839274.21 frames. ], batch size: 61, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:55:54,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3731730.0, ans=0.0 2024-08-18 05:55:56,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3731730.0, ans=0.0 2024-08-18 05:56:12,860 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 31 from Vox, 22 fro AS 2024-08-18 05:56:35,551 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.334e+01 2.539e+01 2.786e+01 5.351e+01, threshold=5.078e+01, percent-clipped=0.0 2024-08-18 05:56:36,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3732030.0, ans=0.125 2024-08-18 05:56:40,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3732030.0, ans=0.125 2024-08-18 05:56:50,767 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 05:56:56,004 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 05:56:56,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-18 05:56:59,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3732130.0, ans=0.125 2024-08-18 05:57:03,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3732130.0, ans=0.0 2024-08-18 05:57:06,310 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1100, loss[loss=0.07829, beats_loss=0.01124, ecapa_loss=0.0001193, whisper_loss=0.06585, over 16285.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01035, ecapa_loss=0.0001427, whisper_loss=0.08958, over 3845084.89 frames. ], batch size: 63, lr: 2.38e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:57:09,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3732230.0, ans=0.2 2024-08-18 05:57:14,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3732230.0, ans=0.125 2024-08-18 05:57:15,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732230.0, ans=0.1 2024-08-18 05:57:17,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3732230.0, ans=0.0 2024-08-18 05:57:22,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3732330.0, ans=0.1 2024-08-18 05:57:23,224 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.94 vs. limit=15.0 2024-08-18 05:57:26,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3732330.0, ans=0.1 2024-08-18 05:57:28,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3732330.0, ans=0.125 2024-08-18 05:57:39,730 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 05:58:08,089 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 05:58:12,460 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 05:58:16,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1150, loss[loss=0.1206, beats_loss=0.0102, ecapa_loss=0.0001343, whisper_loss=0.109, over 18462.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01033, ecapa_loss=0.0001432, whisper_loss=0.09041, over 3825658.24 frames. ], batch size: 72, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:58:16,749 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 05:58:30,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3732830.0, ans=0.125 2024-08-18 05:58:35,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3732830.0, ans=0.2 2024-08-18 05:58:37,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3732830.0, ans=0.035 2024-08-18 05:58:39,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3732830.0, ans=0.125 2024-08-18 05:58:57,161 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.301e+01 2.610e+01 2.950e+01 4.151e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-18 05:59:01,447 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 05:59:04,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.93 vs. limit=22.5 2024-08-18 05:59:08,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3733030.0, ans=0.125 2024-08-18 05:59:11,099 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 05:59:23,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3733130.0, ans=0.0 2024-08-18 05:59:27,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1200, loss[loss=0.1136, beats_loss=0.01081, ecapa_loss=0.0001361, whisper_loss=0.1014, over 19778.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001442, whisper_loss=0.09022, over 3793622.23 frames. ], batch size: 77, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 05:59:37,550 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 05:59:41,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3733330.0, ans=0.2 2024-08-18 05:59:44,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3733330.0, ans=0.0 2024-08-18 05:59:50,891 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 06:00:02,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-18 06:00:10,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3733530.0, ans=0.125 2024-08-18 06:00:16,332 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 06:00:17,137 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.68 vs. limit=15.0 2024-08-18 06:00:25,206 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 06:00:25,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2024-08-18 06:00:38,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3733630.0, ans=0.125 2024-08-18 06:00:40,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1250, loss[loss=0.1083, beats_loss=0.01164, ecapa_loss=0.0001279, whisper_loss=0.09542, over 23258.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01044, ecapa_loss=0.0001433, whisper_loss=0.09007, over 3828625.33 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:00:54,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3733830.0, ans=0.125 2024-08-18 06:01:02,706 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-18 06:01:11,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3733930.0, ans=0.125 2024-08-18 06:01:15,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3733930.0, ans=0.0 2024-08-18 06:01:24,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.310e+01 2.549e+01 2.839e+01 4.783e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-18 06:01:33,304 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 06:01:35,124 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 06:01:45,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.21 vs. limit=15.0 2024-08-18 06:01:49,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2024-08-18 06:01:50,035 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 10 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 06:01:55,072 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1300, loss[loss=0.1013, beats_loss=0.009285, ecapa_loss=0.0001593, whisper_loss=0.09044, over 17916.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01046, ecapa_loss=0.0001434, whisper_loss=0.08911, over 3836144.43 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:01:56,897 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 06:02:04,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3734230.0, ans=0.0 2024-08-18 06:02:14,106 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-18 06:02:15,470 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 06:02:17,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3734330.0, ans=0.2 2024-08-18 06:02:19,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3734330.0, ans=0.1 2024-08-18 06:02:20,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3734330.0, ans=0.0 2024-08-18 06:02:34,804 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 06:02:45,980 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 06:02:56,690 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 06:03:12,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1350, loss[loss=0.08567, beats_loss=0.01164, ecapa_loss=0.0001323, whisper_loss=0.07272, over 19623.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01045, ecapa_loss=0.0001427, whisper_loss=0.08903, over 3842937.00 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:03:24,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-08-18 06:03:29,321 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 06:03:37,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3734830.0, ans=0.09899494936611666 2024-08-18 06:03:40,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3734830.0, ans=0.05 2024-08-18 06:03:43,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3734830.0, ans=0.125 2024-08-18 06:03:46,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3734930.0, ans=0.125 2024-08-18 06:03:54,098 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 06:04:00,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.786e+01 2.254e+01 2.510e+01 2.787e+01 4.431e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 06:04:16,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3735030.0, ans=0.1 2024-08-18 06:04:26,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3735130.0, ans=0.0 2024-08-18 06:04:34,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-18 06:04:34,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1400, loss[loss=0.1045, beats_loss=0.01059, ecapa_loss=0.000169, whisper_loss=0.09227, over 21893.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01043, ecapa_loss=0.0001427, whisper_loss=0.08964, over 3864780.04 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:04:54,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3735330.0, ans=0.125 2024-08-18 06:05:00,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3735330.0, ans=0.0 2024-08-18 06:05:01,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3735330.0, ans=0.125 2024-08-18 06:05:07,191 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 06:05:11,478 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 06:05:12,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3735430.0, ans=0.125 2024-08-18 06:05:14,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3735430.0, ans=0.125 2024-08-18 06:05:18,666 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-18 06:05:22,941 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-18 06:05:36,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.79 vs. limit=22.5 2024-08-18 06:05:37,051 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-18 06:05:47,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3735630.0, ans=0.0 2024-08-18 06:05:50,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3735630.0, ans=0.04949747468305833 2024-08-18 06:06:25,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1450, loss[loss=0.06741, beats_loss=0.01381, ecapa_loss=0.0001009, whisper_loss=0.05259, over 23106.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001421, whisper_loss=0.08912, over 3843963.69 frames. ], batch size: 95, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:06:30,318 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 06:06:41,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.18 vs. limit=10.0 2024-08-18 06:06:46,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3735830.0, ans=0.1 2024-08-18 06:07:01,471 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 06:07:08,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3735930.0, ans=10.0 2024-08-18 06:07:09,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.184e+01 2.399e+01 2.651e+01 6.055e+01, threshold=4.798e+01, percent-clipped=1.0 2024-08-18 06:07:13,682 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 06:07:15,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3736030.0, ans=0.125 2024-08-18 06:07:25,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:28,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3736130.0, ans=0.125 2024-08-18 06:07:32,451 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 06:07:33,179 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.577e-01 2024-08-18 06:07:40,715 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1500, loss[loss=0.07003, beats_loss=0.01313, ecapa_loss=0.000116, whisper_loss=0.05574, over 16770.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01056, ecapa_loss=0.0001407, whisper_loss=0.08832, over 3821328.18 frames. ], batch size: 66, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:07:44,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3736230.0, ans=0.125 2024-08-18 06:07:49,067 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 06:07:54,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3736330.0, ans=0.2 2024-08-18 06:08:19,845 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:08:42,694 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 06:08:52,586 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 06:08:55,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1550, loss[loss=0.1219, beats_loss=0.00919, ecapa_loss=0.0001368, whisper_loss=0.1113, over 22319.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.08863, over 3822963.57 frames. ], batch size: 86, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:09:13,163 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 06:09:22,517 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 12 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 06:09:37,238 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 06:09:38,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.223e+01 2.492e+01 2.734e+01 3.919e+01, threshold=4.985e+01, percent-clipped=0.0 2024-08-18 06:09:59,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3737130.0, ans=0.1 2024-08-18 06:10:02,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.70 vs. limit=22.5 2024-08-18 06:10:04,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3737130.0, ans=0.125 2024-08-18 06:10:06,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3737130.0, ans=0.125 2024-08-18 06:10:08,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1600, loss[loss=0.09924, beats_loss=0.01021, ecapa_loss=0.0001205, whisper_loss=0.08783, over 16267.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.00014, whisper_loss=0.08892, over 3837600.17 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:10:22,356 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 06:10:26,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3737330.0, ans=0.125 2024-08-18 06:10:30,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3737330.0, ans=0.125 2024-08-18 06:10:37,245 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.268e+05 2024-08-18 06:10:38,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3737430.0, ans=0.125 2024-08-18 06:10:44,764 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 06:11:05,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3737630.0, ans=0.125 2024-08-18 06:11:06,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.69 vs. limit=15.0 2024-08-18 06:11:16,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3737630.0, ans=0.2 2024-08-18 06:11:20,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1650, loss[loss=0.1276, beats_loss=0.008375, ecapa_loss=0.0001811, whisper_loss=0.1175, over 21659.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001408, whisper_loss=0.08956, over 3880579.25 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:11:24,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3737730.0, ans=10.0 2024-08-18 06:11:38,936 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 06:11:40,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3737830.0, ans=0.125 2024-08-18 06:11:41,488 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 06:11:42,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3737830.0, ans=0.035 2024-08-18 06:11:56,994 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=12.0 2024-08-18 06:11:58,786 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.262e+01 2.498e+01 2.828e+01 4.112e+01, threshold=4.996e+01, percent-clipped=0.0 2024-08-18 06:12:09,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.25 vs. limit=22.5 2024-08-18 06:12:13,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3738130.0, ans=0.0 2024-08-18 06:12:14,015 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 06:12:15,033 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09963374584913254, model_norm_threshold=49.962425231933594 2024-08-18 06:12:15,208 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.352e+04, grad_sumsq=4.352e+04, orig_rms_sq=1.000e+00 2024-08-18 06:12:16,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3738130.0, ans=0.025 2024-08-18 06:12:20,906 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-18 06:12:27,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1700, loss[loss=0.09782, beats_loss=0.007941, ecapa_loss=0.0001704, whisper_loss=0.08817, over 15282.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001402, whisper_loss=0.09073, over 3887473.48 frames. ], batch size: 61, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:12:28,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3738230.0, ans=0.125 2024-08-18 06:12:28,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3738230.0, ans=0.125 2024-08-18 06:12:28,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3738230.0, ans=0.125 2024-08-18 06:12:35,861 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 06:12:39,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3738230.0, ans=0.125 2024-08-18 06:12:43,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3738330.0, ans=0.1 2024-08-18 06:12:48,197 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 06:12:50,690 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 06:12:57,699 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 06:13:08,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3738530.0, ans=0.125 2024-08-18 06:13:14,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3738530.0, ans=0.09899494936611666 2024-08-18 06:13:23,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.61 vs. limit=10.0 2024-08-18 06:13:26,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3738630.0, ans=0.0 2024-08-18 06:13:35,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1750, loss[loss=0.1081, beats_loss=0.008027, ecapa_loss=0.0001482, whisper_loss=0.0986, over 16260.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001406, whisper_loss=0.09099, over 3880512.85 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:13:35,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3738730.0, ans=0.025 2024-08-18 06:13:51,246 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 06:13:51,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.02 vs. limit=22.5 2024-08-18 06:13:55,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3738830.0, ans=0.09899494936611666 2024-08-18 06:14:06,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3738930.0, ans=0.125 2024-08-18 06:14:08,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3738930.0, ans=0.05 2024-08-18 06:14:08,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3738930.0, ans=0.125 2024-08-18 06:14:09,125 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 06:14:15,688 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.240e+01 2.515e+01 2.865e+01 5.015e+02, threshold=5.030e+01, percent-clipped=2.0 2024-08-18 06:14:20,889 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 06:14:39,003 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 06:14:40,939 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-18 06:14:42,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1800, loss[loss=0.07273, beats_loss=0.01424, ecapa_loss=0.0001066, whisper_loss=0.05742, over 16047.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01036, ecapa_loss=0.0001413, whisper_loss=0.09058, over 3851932.30 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:14:51,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3739230.0, ans=0.09899494936611666 2024-08-18 06:14:59,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3739330.0, ans=0.125 2024-08-18 06:15:05,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2024-08-18 06:15:34,887 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04103388637304306, model_norm_threshold=50.29759979248047 2024-08-18 06:15:35,049 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.438e+05, grad_sumsq=3.364e+07, orig_rms_sq=1.022e-02 2024-08-18 06:15:40,718 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 06:15:48,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3739730.0, ans=0.125 2024-08-18 06:15:49,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1850, loss[loss=0.1028, beats_loss=0.01091, ecapa_loss=0.0001245, whisper_loss=0.09065, over 23008.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01026, ecapa_loss=0.0001416, whisper_loss=0.09127, over 3819603.22 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:16:23,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3739930.0, ans=0.125 2024-08-18 06:16:27,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3739930.0, ans=0.2 2024-08-18 06:16:29,341 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.281e+01 2.584e+01 3.021e+01 1.226e+03, threshold=5.167e+01, percent-clipped=3.0 2024-08-18 06:16:42,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3740030.0, ans=0.0 2024-08-18 06:16:46,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3740130.0, ans=0.0 2024-08-18 06:16:47,046 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 06:16:47,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.46 vs. limit=15.0 2024-08-18 06:16:54,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3740130.0, ans=0.2 2024-08-18 06:16:55,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3740130.0, ans=0.0 2024-08-18 06:16:58,384 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1900, loss[loss=0.0971, beats_loss=0.00978, ecapa_loss=0.000144, whisper_loss=0.08588, over 17808.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01028, ecapa_loss=0.0001425, whisper_loss=0.09057, over 3790906.97 frames. ], batch size: 74, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:17:03,397 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 06:17:06,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=15.0 2024-08-18 06:17:08,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3740230.0, ans=0.2 2024-08-18 06:17:11,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.09 vs. limit=22.5 2024-08-18 06:17:23,019 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 06:17:23,944 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 06:17:44,431 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-18 06:17:47,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3740530.0, ans=0.0 2024-08-18 06:17:52,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3740630.0, ans=0.2 2024-08-18 06:18:05,313 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 1950, loss[loss=0.1107, beats_loss=0.008702, ecapa_loss=0.0001473, whisper_loss=0.1005, over 20762.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01036, ecapa_loss=0.0001416, whisper_loss=0.08973, over 3751399.90 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:18:26,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3740830.0, ans=0.2 2024-08-18 06:18:40,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3740930.0, ans=0.125 2024-08-18 06:18:43,505 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.240e+01 2.452e+01 2.849e+01 7.205e+01, threshold=4.903e+01, percent-clipped=1.0 2024-08-18 06:18:50,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3741030.0, ans=0.0 2024-08-18 06:18:57,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3741130.0, ans=0.125 2024-08-18 06:19:01,049 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 06:19:04,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3741130.0, ans=0.1 2024-08-18 06:19:11,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2000, loss[loss=0.09568, beats_loss=0.009461, ecapa_loss=0.0001278, whisper_loss=0.08494, over 16955.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001407, whisper_loss=0.0896, over 3797818.29 frames. ], batch size: 66, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:19:49,460 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 06:20:01,500 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 06:20:10,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3741630.0, ans=0.2 2024-08-18 06:20:13,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3741630.0, ans=0.0 2024-08-18 06:20:16,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2050, loss[loss=0.1236, beats_loss=0.00871, ecapa_loss=0.0001516, whisper_loss=0.1134, over 16700.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01034, ecapa_loss=0.000142, whisper_loss=0.08961, over 3795653.02 frames. ], batch size: 64, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:20:21,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3741730.0, ans=0.1 2024-08-18 06:20:24,426 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 06:20:26,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3741730.0, ans=10.0 2024-08-18 06:20:32,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3741830.0, ans=0.1 2024-08-18 06:20:37,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3741830.0, ans=0.125 2024-08-18 06:20:45,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3741930.0, ans=0.125 2024-08-18 06:20:49,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3741930.0, ans=0.125 2024-08-18 06:20:53,164 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 06:20:53,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3741930.0, ans=0.2 2024-08-18 06:20:54,432 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.321e+01 2.575e+01 2.805e+01 3.958e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 06:20:54,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3742030.0, ans=0.1 2024-08-18 06:20:55,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.75 vs. limit=15.0 2024-08-18 06:21:19,616 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 06:21:23,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2100, loss[loss=0.1118, beats_loss=0.01202, ecapa_loss=0.0001496, whisper_loss=0.0983, over 21479.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01054, ecapa_loss=0.0001409, whisper_loss=0.08863, over 3798350.07 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:21:33,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2024-08-18 06:21:39,100 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 06:21:42,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-08-18 06:21:45,076 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 06:22:26,798 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2150, loss[loss=0.07269, beats_loss=0.01387, ecapa_loss=0.0001094, whisper_loss=0.05773, over 13794.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01066, ecapa_loss=0.0001405, whisper_loss=0.08868, over 3790437.91 frames. ], batch size: 54, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:22:30,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3742730.0, ans=0.125 2024-08-18 06:22:32,665 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 06:22:42,023 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 11 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 06:22:42,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2024-08-18 06:22:56,435 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 06:23:00,715 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 06:23:03,704 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 06:23:05,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3742930.0, ans=0.125 2024-08-18 06:23:07,302 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.693e+01 2.305e+01 2.600e+01 2.971e+01 3.562e+02, threshold=5.201e+01, percent-clipped=4.0 2024-08-18 06:23:18,528 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 06:23:39,384 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2200, loss[loss=0.1132, beats_loss=0.008007, ecapa_loss=0.0001588, whisper_loss=0.1036, over 16198.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01063, ecapa_loss=0.0001404, whisper_loss=0.08916, over 3793942.51 frames. ], batch size: 63, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:24:11,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3743430.0, ans=0.125 2024-08-18 06:24:19,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3743430.0, ans=0.125 2024-08-18 06:24:23,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3743430.0, ans=0.0 2024-08-18 06:24:30,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3743530.0, ans=0.1 2024-08-18 06:24:53,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3743630.0, ans=6.0 2024-08-18 06:24:59,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2250, loss[loss=0.1067, beats_loss=0.0102, ecapa_loss=0.0001329, whisper_loss=0.09518, over 17999.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001414, whisper_loss=0.09012, over 3815448.06 frames. ], batch size: 68, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:25:12,050 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 13 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 06:25:14,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.46 vs. limit=15.0 2024-08-18 06:25:23,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3743830.0, ans=0.1 2024-08-18 06:25:25,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.21 vs. limit=22.5 2024-08-18 06:25:33,567 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 06:25:47,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.275e+01 2.589e+01 2.957e+01 4.064e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 06:25:52,481 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 06:25:57,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3744030.0, ans=0.125 2024-08-18 06:26:00,823 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 06:26:20,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2300, loss[loss=0.09911, beats_loss=0.009886, ecapa_loss=0.0001447, whisper_loss=0.08778, over 18507.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01065, ecapa_loss=0.0001413, whisper_loss=0.09002, over 3855759.76 frames. ], batch size: 74, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:26:39,048 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 06:26:55,799 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 06:26:58,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2024-08-18 06:27:00,365 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 06:27:03,054 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 06:27:06,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3744430.0, ans=0.125 2024-08-18 06:27:14,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3744530.0, ans=0.0 2024-08-18 06:27:42,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2350, loss[loss=0.106, beats_loss=0.01049, ecapa_loss=0.0001353, whisper_loss=0.09416, over 21760.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001427, whisper_loss=0.09016, over 3859394.95 frames. ], batch size: 84, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:27:47,311 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-18 06:28:29,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.243e+01 2.499e+01 2.727e+01 3.618e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-18 06:28:32,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3745030.0, ans=0.1 2024-08-18 06:28:43,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-08-18 06:28:55,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3745130.0, ans=0.0 2024-08-18 06:29:04,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2400, loss[loss=0.1255, beats_loss=0.01011, ecapa_loss=0.0001458, whisper_loss=0.1139, over 22615.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.0906, over 3895064.29 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:29:15,871 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.016e-01 2024-08-18 06:29:15,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3745230.0, ans=0.125 2024-08-18 06:29:19,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3745330.0, ans=0.125 2024-08-18 06:29:20,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-08-18 06:29:28,824 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 06:29:36,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3745430.0, ans=0.0 2024-08-18 06:29:46,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3745430.0, ans=0.0 2024-08-18 06:29:55,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3745530.0, ans=0.1 2024-08-18 06:30:01,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3745530.0, ans=0.2 2024-08-18 06:30:05,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3745630.0, ans=0.125 2024-08-18 06:30:07,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.93 vs. limit=10.0 2024-08-18 06:30:15,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-18 06:30:19,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2450, loss[loss=0.1008, beats_loss=0.01081, ecapa_loss=0.0001841, whisper_loss=0.08812, over 22582.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001427, whisper_loss=0.09072, over 3893535.60 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:30:45,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3745830.0, ans=0.125 2024-08-18 06:31:08,961 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.751e+01 2.283e+01 2.536e+01 2.779e+01 5.169e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 06:31:11,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3746030.0, ans=0.125 2024-08-18 06:31:11,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3746030.0, ans=0.125 2024-08-18 06:31:15,583 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-18 06:31:20,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3746030.0, ans=0.2 2024-08-18 06:31:23,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3746030.0, ans=0.0 2024-08-18 06:31:40,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2500, loss[loss=0.1151, beats_loss=0.009957, ecapa_loss=0.0001225, whisper_loss=0.104, over 23333.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001432, whisper_loss=0.08997, over 3879312.36 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:31:41,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3746230.0, ans=0.1 2024-08-18 06:32:17,123 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 06:32:18,964 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-18 06:32:29,324 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-18 06:32:52,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3746630.0, ans=0.0 2024-08-18 06:32:53,561 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 06:32:57,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2550, loss[loss=0.08075, beats_loss=0.007174, ecapa_loss=0.0001626, whisper_loss=0.07195, over 14109.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001434, whisper_loss=0.09039, over 3871890.71 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:32:58,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3746730.0, ans=0.125 2024-08-18 06:33:02,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2024-08-18 06:33:03,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3746730.0, ans=0.5 2024-08-18 06:33:12,902 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.553e-02 2024-08-18 06:33:18,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3746830.0, ans=0.0 2024-08-18 06:33:27,306 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 06:33:27,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3746930.0, ans=0.125 2024-08-18 06:33:27,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3746930.0, ans=0.125 2024-08-18 06:33:31,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.50 vs. limit=15.0 2024-08-18 06:33:41,714 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.402e+01 2.684e+01 2.879e+01 3.926e+01, threshold=5.368e+01, percent-clipped=1.0 2024-08-18 06:33:50,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3747030.0, ans=0.125 2024-08-18 06:34:13,255 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2600, loss[loss=0.08433, beats_loss=0.01107, ecapa_loss=0.000119, whisper_loss=0.07207, over 13733.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001441, whisper_loss=0.0903, over 3854169.72 frames. ], batch size: 53, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:34:14,992 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 06:34:17,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2024-08-18 06:34:19,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3747230.0, ans=0.0 2024-08-18 06:34:44,719 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 06:34:47,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2024-08-18 06:35:00,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3747530.0, ans=0.0 2024-08-18 06:35:03,061 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 17 from Vox, 19 fro AS 2024-08-18 06:35:11,854 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 06:35:18,208 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 06:35:27,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.06 vs. limit=22.5 2024-08-18 06:35:29,323 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2650, loss[loss=0.08727, beats_loss=0.009127, ecapa_loss=0.0001793, whisper_loss=0.07635, over 15222.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001438, whisper_loss=0.09034, over 3873571.46 frames. ], batch size: 62, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:35:41,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.63 vs. limit=10.0 2024-08-18 06:35:55,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3747830.0, ans=0.125 2024-08-18 06:35:56,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3747830.0, ans=0.2 2024-08-18 06:36:05,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3747930.0, ans=0.0 2024-08-18 06:36:09,766 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 06:36:12,414 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.431e+01 2.791e+01 3.155e+01 3.699e+02, threshold=5.582e+01, percent-clipped=1.0 2024-08-18 06:36:19,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3748030.0, ans=0.125 2024-08-18 06:36:29,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3748130.0, ans=0.0 2024-08-18 06:36:34,205 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-18 06:36:40,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3748130.0, ans=0.0 2024-08-18 06:36:40,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3748130.0, ans=0.0 2024-08-18 06:36:46,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2700, loss[loss=0.1363, beats_loss=0.007145, ecapa_loss=0.0001155, whisper_loss=0.128, over 17461.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001433, whisper_loss=0.08978, over 3887620.92 frames. ], batch size: 65, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:36:52,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3748230.0, ans=0.125 2024-08-18 06:36:59,466 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 06:37:25,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3748430.0, ans=0.125 2024-08-18 06:37:25,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-18 06:37:53,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3748630.0, ans=0.0 2024-08-18 06:38:05,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2750, loss[loss=0.1112, beats_loss=0.008358, ecapa_loss=0.0001621, whisper_loss=0.1012, over 17318.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01053, ecapa_loss=0.0001425, whisper_loss=0.0895, over 3852041.18 frames. ], batch size: 68, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:38:13,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.42 vs. limit=22.5 2024-08-18 06:38:32,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3748830.0, ans=0.125 2024-08-18 06:38:37,295 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-18 06:38:40,149 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 06:38:51,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.293e+01 2.523e+01 2.815e+01 3.785e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-18 06:38:54,548 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 06:38:55,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3749030.0, ans=0.1 2024-08-18 06:39:01,612 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 06:39:28,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2800, loss[loss=0.1021, beats_loss=0.01148, ecapa_loss=0.0001933, whisper_loss=0.08866, over 16980.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001421, whisper_loss=0.08989, over 3855405.67 frames. ], batch size: 72, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:39:29,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3749230.0, ans=0.125 2024-08-18 06:39:33,355 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 06:39:39,885 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 06:39:45,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3749330.0, ans=0.0 2024-08-18 06:39:46,938 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 06:39:55,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2024-08-18 06:40:07,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3749430.0, ans=10.0 2024-08-18 06:40:15,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3749430.0, ans=0.125 2024-08-18 06:40:19,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3749530.0, ans=0.125 2024-08-18 06:40:20,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3749530.0, ans=0.125 2024-08-18 06:40:21,793 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 06:40:38,872 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 06:40:51,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2850, loss[loss=0.1114, beats_loss=0.01256, ecapa_loss=0.0001231, whisper_loss=0.09758, over 21155.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001423, whisper_loss=0.08964, over 3861865.32 frames. ], batch size: 83, lr: 2.37e-03, grad_scale: 1.152921504606847e+18 2024-08-18 06:40:55,109 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 06:41:00,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3749730.0, ans=0.125 2024-08-18 06:41:18,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3749830.0, ans=0.125 2024-08-18 06:41:27,138 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-18 06:41:33,546 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 19 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 06:41:38,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3749930.0, ans=0.125 2024-08-18 06:41:42,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.356e+01 2.615e+01 2.993e+01 1.081e+02, threshold=5.230e+01, percent-clipped=3.0 2024-08-18 06:42:02,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2024-08-18 06:42:13,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=15.0 2024-08-18 06:42:16,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2900, loss[loss=0.08837, beats_loss=0.01295, ecapa_loss=0.0001732, whisper_loss=0.07368, over 21767.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.0001426, whisper_loss=0.08921, over 3857694.07 frames. ], batch size: 95, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:43:23,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3750630.0, ans=0.125 2024-08-18 06:43:25,665 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 06:43:29,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 2950, loss[loss=0.09749, beats_loss=0.01152, ecapa_loss=0.0001229, whisper_loss=0.08474, over 14762.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0107, ecapa_loss=0.0001428, whisper_loss=0.08949, over 3872954.15 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:43:37,959 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 06:43:50,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3750830.0, ans=0.125 2024-08-18 06:44:01,828 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 06:44:05,897 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 06:44:09,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.710e+01 2.348e+01 2.612e+01 2.875e+01 5.578e+01, threshold=5.225e+01, percent-clipped=1.0 2024-08-18 06:44:10,961 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 06:44:11,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3751030.0, ans=0.125 2024-08-18 06:44:35,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3000, loss[loss=0.1365, beats_loss=0.00791, ecapa_loss=0.0001901, whisper_loss=0.1267, over 19619.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.0001435, whisper_loss=0.08982, over 3872292.63 frames. ], batch size: 77, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:44:35,894 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 06:45:15,288 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005294, whisper_loss=0.2485, over 922467.00 frames. 2024-08-18 06:45:23,071 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.1983, 1.6331, 1.9570, 0.9444, 1.2901, 1.5139, 1.8949, 1.8252], device='cuda:3') 2024-08-18 06:45:31,812 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on SV_voxceleb1: loss=0.004081, beats_loss=0, ecapa_loss=0.0004081, whisper_loss=0, over 939242.00 frames. 2024-08-18 06:45:39,882 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.5911, 1.8941, 2.3229, 1.1240], device='cuda:3') 2024-08-18 06:47:15,200 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on AT_audioset: loss=0.02323, beats_loss=0.02323, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 06:47:15,210 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 06:47:20,563 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 06:47:28,396 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 06:47:32,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3751330.0, ans=0.125 2024-08-18 06:47:41,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3751430.0, ans=0.125 2024-08-18 06:47:47,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.33 vs. limit=6.0 2024-08-18 06:48:02,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3751530.0, ans=0.0 2024-08-18 06:48:03,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3751530.0, ans=0.0 2024-08-18 06:48:21,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3050, loss[loss=0.09269, beats_loss=0.01234, ecapa_loss=0.0001338, whisper_loss=0.07901, over 22666.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.0001439, whisper_loss=0.09086, over 3877880.26 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:48:30,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3751730.0, ans=0.125 2024-08-18 06:48:57,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2024-08-18 06:49:01,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.301e+01 2.557e+01 2.884e+01 4.342e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-18 06:49:18,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3752130.0, ans=0.125 2024-08-18 06:49:28,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3100, loss[loss=0.09854, beats_loss=0.0143, ecapa_loss=0.0001174, whisper_loss=0.08306, over 21312.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.09162, over 3877607.04 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:49:38,859 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 06:49:45,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3752330.0, ans=0.0 2024-08-18 06:50:36,416 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3150, loss[loss=0.0953, beats_loss=0.01041, ecapa_loss=9.828e-05, whisper_loss=0.0839, over 19357.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01053, ecapa_loss=0.0001456, whisper_loss=0.09129, over 3850703.84 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:50:41,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3752730.0, ans=0.07 2024-08-18 06:50:45,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-18 06:50:52,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3752830.0, ans=0.0 2024-08-18 06:51:04,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3752930.0, ans=0.2 2024-08-18 06:51:07,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.84 vs. limit=22.5 2024-08-18 06:51:08,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3752930.0, ans=0.0 2024-08-18 06:51:11,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3752930.0, ans=0.125 2024-08-18 06:51:17,308 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.299e+01 2.506e+01 2.759e+01 4.272e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-18 06:51:34,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3753130.0, ans=0.125 2024-08-18 06:51:44,057 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3200, loss[loss=0.1023, beats_loss=0.01005, ecapa_loss=0.0001656, whisper_loss=0.09056, over 21731.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001465, whisper_loss=0.09079, over 3840187.12 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:51:45,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3753230.0, ans=0.125 2024-08-18 06:51:49,609 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 06:51:51,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=12.0 2024-08-18 06:51:53,456 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 06:51:54,786 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 06:52:12,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3753430.0, ans=0.125 2024-08-18 06:52:15,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2024-08-18 06:52:21,260 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 06:52:39,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3753630.0, ans=0.125 2024-08-18 06:52:50,677 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3250, loss[loss=0.09152, beats_loss=0.009311, ecapa_loss=0.000179, whisper_loss=0.08042, over 19075.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01064, ecapa_loss=0.0001464, whisper_loss=0.09049, over 3818989.50 frames. ], batch size: 81, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:53:14,828 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 06:53:30,579 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.285e+01 2.505e+01 2.811e+01 5.288e+01, threshold=5.010e+01, percent-clipped=1.0 2024-08-18 06:53:31,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3754030.0, ans=0.125 2024-08-18 06:53:40,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-18 06:53:57,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3300, loss[loss=0.1112, beats_loss=0.008953, ecapa_loss=0.0001943, whisper_loss=0.1003, over 20877.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01065, ecapa_loss=0.0001467, whisper_loss=0.09068, over 3864003.57 frames. ], batch size: 87, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:53:59,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3754230.0, ans=0.0 2024-08-18 06:54:04,758 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2024-08-18 06:54:15,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3754330.0, ans=0.0 2024-08-18 06:54:22,202 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 06:54:32,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3754430.0, ans=0.0 2024-08-18 06:54:52,320 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08354800194501877, model_norm_threshold=50.10049057006836 2024-08-18 06:54:52,485 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.148e+04, grad_sumsq=7.148e+04, orig_rms_sq=1.000e+00 2024-08-18 06:54:56,732 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 33 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 06:55:05,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3350, loss[loss=0.09332, beats_loss=0.01175, ecapa_loss=0.0001323, whisper_loss=0.08024, over 21815.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01064, ecapa_loss=0.0001458, whisper_loss=0.09057, over 3850727.93 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:55:14,391 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 18 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 06:55:21,520 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 06:55:25,234 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 06:55:45,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.324e+01 2.611e+01 2.907e+01 5.997e+02, threshold=5.222e+01, percent-clipped=2.0 2024-08-18 06:55:55,503 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 06:56:05,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.61 vs. limit=15.0 2024-08-18 06:56:05,759 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-08-18 06:56:12,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3400, loss[loss=0.09805, beats_loss=0.01065, ecapa_loss=0.0001065, whisper_loss=0.08634, over 22429.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01059, ecapa_loss=0.000145, whisper_loss=0.09063, over 3883036.93 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 5.764607523034235e+17 2024-08-18 06:56:18,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3755230.0, ans=0.1 2024-08-18 06:56:21,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3755230.0, ans=0.1 2024-08-18 06:56:35,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3755330.0, ans=0.2 2024-08-18 06:57:09,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3755630.0, ans=0.0 2024-08-18 06:57:11,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3755630.0, ans=0.2 2024-08-18 06:57:13,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3755630.0, ans=0.125 2024-08-18 06:57:13,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3755630.0, ans=0.125 2024-08-18 06:57:26,962 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3450, loss[loss=0.09734, beats_loss=0.01201, ecapa_loss=0.0001222, whisper_loss=0.08411, over 17795.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01069, ecapa_loss=0.0001453, whisper_loss=0.08991, over 3888415.45 frames. ], batch size: 71, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 06:57:56,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3755830.0, ans=10.0 2024-08-18 06:58:13,950 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 06:58:15,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3755930.0, ans=0.1 2024-08-18 06:58:21,267 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.293e+01 2.573e+01 2.898e+01 3.051e+02, threshold=5.147e+01, percent-clipped=2.0 2024-08-18 06:58:40,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3756130.0, ans=0.125 2024-08-18 06:58:54,590 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3500, loss[loss=0.09633, beats_loss=0.01116, ecapa_loss=0.0001516, whisper_loss=0.08365, over 22077.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001456, whisper_loss=0.0899, over 3899826.21 frames. ], batch size: 92, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 06:58:55,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.43 vs. limit=22.5 2024-08-18 06:59:03,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3756230.0, ans=0.125 2024-08-18 06:59:18,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.85 vs. limit=15.0 2024-08-18 06:59:24,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3756330.0, ans=0.125 2024-08-18 06:59:30,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-18 06:59:30,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3756330.0, ans=15.0 2024-08-18 07:00:01,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3756530.0, ans=0.2 2024-08-18 07:00:23,620 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 07:00:35,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3550, loss[loss=0.1094, beats_loss=0.01072, ecapa_loss=0.0001257, whisper_loss=0.09746, over 19620.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001447, whisper_loss=0.09017, over 3904603.27 frames. ], batch size: 75, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:00:55,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3756830.0, ans=0.125 2024-08-18 07:01:01,655 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 18 from Vox, 13 fro AS 2024-08-18 07:01:10,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3756830.0, ans=0.125 2024-08-18 07:01:30,096 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 34 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 07:01:34,746 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 07:01:34,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3757030.0, ans=0.125 2024-08-18 07:01:35,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.288e+01 2.493e+01 2.847e+01 8.952e+01, threshold=4.987e+01, percent-clipped=1.0 2024-08-18 07:01:37,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3757030.0, ans=0.05 2024-08-18 07:01:42,783 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 07:01:57,056 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-18 07:02:09,415 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3600, loss[loss=0.1066, beats_loss=0.01035, ecapa_loss=0.0001776, whisper_loss=0.09445, over 22239.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001457, whisper_loss=0.0904, over 3870299.53 frames. ], batch size: 93, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:02:25,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3757330.0, ans=0.125 2024-08-18 07:02:32,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3757330.0, ans=0.125 2024-08-18 07:02:35,650 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-18 07:02:41,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3757430.0, ans=0.1 2024-08-18 07:02:46,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3757430.0, ans=0.0 2024-08-18 07:02:53,603 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 07:03:01,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3757530.0, ans=0.1 2024-08-18 07:03:08,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3757630.0, ans=0.1 2024-08-18 07:03:18,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3650, loss[loss=0.08173, beats_loss=0.01169, ecapa_loss=0.0001561, whisper_loss=0.06848, over 14984.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001446, whisper_loss=0.08982, over 3874446.65 frames. ], batch size: 59, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:03:27,173 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 07:03:46,373 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 15 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 07:03:46,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3757930.0, ans=0.125 2024-08-18 07:03:49,518 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 07:03:57,097 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 07:04:02,326 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.285e+01 2.488e+01 2.673e+01 1.240e+02, threshold=4.975e+01, percent-clipped=2.0 2024-08-18 07:04:27,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3758130.0, ans=0.125 2024-08-18 07:04:29,319 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2024-08-18 07:04:29,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3700, loss[loss=0.08407, beats_loss=0.01274, ecapa_loss=0.0001094, whisper_loss=0.07024, over 22323.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001439, whisper_loss=0.08962, over 3857219.97 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:04:35,846 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 07:04:45,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.26 vs. limit=10.0 2024-08-18 07:04:48,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3758330.0, ans=0.05 2024-08-18 07:05:16,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.74 vs. limit=15.0 2024-08-18 07:05:27,819 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 07:05:29,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3758630.0, ans=0.0 2024-08-18 07:05:35,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3758630.0, ans=0.2 2024-08-18 07:05:39,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3750, loss[loss=0.09018, beats_loss=0.009771, ecapa_loss=0.0001876, whisper_loss=0.07854, over 13897.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001445, whisper_loss=0.08997, over 3879427.80 frames. ], batch size: 58, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:06:16,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3758930.0, ans=0.0 2024-08-18 07:06:19,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3758930.0, ans=0.2 2024-08-18 07:06:22,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.345e+01 2.587e+01 2.858e+01 2.449e+02, threshold=5.174e+01, percent-clipped=2.0 2024-08-18 07:06:34,543 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2024-08-18 07:06:37,776 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 07:06:47,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3800, loss[loss=0.1104, beats_loss=0.009748, ecapa_loss=0.0001374, whisper_loss=0.09931, over 21737.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001454, whisper_loss=0.08947, over 3864040.61 frames. ], batch size: 83, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:06:49,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-18 07:06:50,712 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 07:06:51,220 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2024-08-18 07:06:51,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3759230.0, ans=0.125 2024-08-18 07:07:17,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3759430.0, ans=0.125 2024-08-18 07:07:33,036 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 07:07:36,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3759530.0, ans=0.0 2024-08-18 07:07:52,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3850, loss[loss=0.07261, beats_loss=0.01457, ecapa_loss=0.0001426, whisper_loss=0.05662, over 21768.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001469, whisper_loss=0.0893, over 3852493.81 frames. ], batch size: 95, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:07:55,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3759730.0, ans=0.2 2024-08-18 07:07:59,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3759730.0, ans=0.125 2024-08-18 07:08:00,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3759730.0, ans=0.2 2024-08-18 07:08:17,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3759930.0, ans=0.125 2024-08-18 07:08:17,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3759930.0, ans=0.125 2024-08-18 07:08:20,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3759930.0, ans=0.125 2024-08-18 07:08:20,928 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 07:08:23,219 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 07:08:34,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3760030.0, ans=0.0 2024-08-18 07:08:35,273 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.367e+01 2.592e+01 2.770e+01 3.521e+02, threshold=5.184e+01, percent-clipped=1.0 2024-08-18 07:08:45,500 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 07:08:59,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3900, loss[loss=0.1085, beats_loss=0.01025, ecapa_loss=0.000158, whisper_loss=0.09665, over 20874.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001472, whisper_loss=0.09047, over 3865410.04 frames. ], batch size: 86, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:09:01,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3760230.0, ans=0.1 2024-08-18 07:09:13,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3760330.0, ans=0.125 2024-08-18 07:09:15,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3760330.0, ans=0.95 2024-08-18 07:09:48,705 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 07:10:04,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 3950, loss[loss=0.08656, beats_loss=0.01032, ecapa_loss=0.0001617, whisper_loss=0.07462, over 21601.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01035, ecapa_loss=0.0001479, whisper_loss=0.09077, over 3884980.71 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:10:09,691 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 07:10:12,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3760730.0, ans=0.125 2024-08-18 07:10:30,484 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 07:10:32,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.28 vs. limit=22.5 2024-08-18 07:10:39,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-18 07:10:43,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3761030.0, ans=0.125 2024-08-18 07:10:44,185 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.379e+01 2.624e+01 2.905e+01 3.854e+01, threshold=5.247e+01, percent-clipped=0.0 2024-08-18 07:10:46,966 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 11 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 07:11:00,832 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 26 from Vox, 21 fro AS 2024-08-18 07:11:02,128 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 07:11:02,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3761130.0, ans=0.0 2024-08-18 07:11:04,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3761130.0, ans=0.125 2024-08-18 07:11:06,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3761130.0, ans=0.2 2024-08-18 07:11:08,234 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-08-18 07:11:08,685 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4000, loss[loss=0.07141, beats_loss=0.01056, ecapa_loss=0.0001367, whisper_loss=0.05949, over 19498.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01035, ecapa_loss=0.000148, whisper_loss=0.09096, over 3916009.32 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:11:28,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3761330.0, ans=0.125 2024-08-18 07:11:33,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3761430.0, ans=0.1 2024-08-18 07:11:34,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3761430.0, ans=10.0 2024-08-18 07:11:47,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3761530.0, ans=0.125 2024-08-18 07:11:48,605 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 07:12:04,054 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08813267201185226, model_norm_threshold=52.47489929199219 2024-08-18 07:12:04,214 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.229e+04, grad_sumsq=4.229e+04, orig_rms_sq=1.000e+00 2024-08-18 07:12:05,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3761630.0, ans=0.1 2024-08-18 07:12:07,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3761630.0, ans=0.07 2024-08-18 07:12:07,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.40 vs. limit=10.0 2024-08-18 07:12:12,216 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-18 07:12:13,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4050, loss[loss=0.1077, beats_loss=0.01113, ecapa_loss=0.0001208, whisper_loss=0.09532, over 14677.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001482, whisper_loss=0.09084, over 3938017.26 frames. ], batch size: 56, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:12:19,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3761730.0, ans=0.1 2024-08-18 07:12:22,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3761730.0, ans=0.1 2024-08-18 07:12:32,307 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 07:12:33,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3761830.0, ans=0.0 2024-08-18 07:12:47,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.26 vs. limit=22.5 2024-08-18 07:12:53,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3762030.0, ans=0.0 2024-08-18 07:12:54,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.303e+01 2.573e+01 2.954e+01 5.954e+02, threshold=5.146e+01, percent-clipped=3.0 2024-08-18 07:13:14,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3762130.0, ans=0.125 2024-08-18 07:13:16,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3762130.0, ans=0.0 2024-08-18 07:13:18,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4100, loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001488, whisper_loss=0.0909, over 21409.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001473, whisper_loss=0.09054, over 3911334.83 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:13:19,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3762230.0, ans=0.0 2024-08-18 07:13:29,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3762230.0, ans=0.2 2024-08-18 07:13:33,609 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 39 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 07:13:36,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3762330.0, ans=0.0 2024-08-18 07:13:38,600 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 07:13:45,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3762430.0, ans=0.125 2024-08-18 07:14:02,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3762530.0, ans=0.0 2024-08-18 07:14:05,947 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 07:14:22,247 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4150, loss[loss=0.1102, beats_loss=0.01154, ecapa_loss=0.0001199, whisper_loss=0.09743, over 22585.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001464, whisper_loss=0.09085, over 3903475.32 frames. ], batch size: 86, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:14:49,629 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:14:49,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3762930.0, ans=0.125 2024-08-18 07:14:54,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=15.0 2024-08-18 07:14:54,475 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 07:14:56,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3762930.0, ans=10.0 2024-08-18 07:14:59,515 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 07:15:01,857 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.335e+01 2.524e+01 2.797e+01 3.665e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 07:15:12,379 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-18 07:15:14,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3763130.0, ans=0.125 2024-08-18 07:15:23,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3763130.0, ans=0.1 2024-08-18 07:15:26,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4200, loss[loss=0.1226, beats_loss=0.009281, ecapa_loss=0.0001095, whisper_loss=0.1122, over 19265.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.09122, over 3910868.05 frames. ], batch size: 69, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:15:28,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3763230.0, ans=0.125 2024-08-18 07:15:30,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3763230.0, ans=0.125 2024-08-18 07:15:44,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-08-18 07:16:10,499 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 07:16:23,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3763630.0, ans=0.125 2024-08-18 07:16:25,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3763630.0, ans=0.0 2024-08-18 07:16:30,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4250, loss[loss=0.08996, beats_loss=0.01269, ecapa_loss=0.0001249, whisper_loss=0.07602, over 21421.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01063, ecapa_loss=0.000145, whisper_loss=0.09004, over 3913753.49 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:16:31,827 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 07:16:36,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3763730.0, ans=0.125 2024-08-18 07:16:36,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.37 vs. limit=15.0 2024-08-18 07:16:46,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3763830.0, ans=0.125 2024-08-18 07:16:50,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.13 vs. limit=22.5 2024-08-18 07:16:58,887 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 07:17:07,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3763930.0, ans=0.0 2024-08-18 07:17:09,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3764030.0, ans=0.0 2024-08-18 07:17:10,781 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.296e+01 2.590e+01 3.069e+01 5.204e+01, threshold=5.180e+01, percent-clipped=2.0 2024-08-18 07:17:34,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4300, loss[loss=0.1291, beats_loss=0.008419, ecapa_loss=0.0001316, whisper_loss=0.1193, over 20047.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01057, ecapa_loss=0.000145, whisper_loss=0.09078, over 3931523.38 frames. ], batch size: 75, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:17:54,664 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 07:17:57,157 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 11 from Vox, 47 fro AS 2024-08-18 07:18:06,275 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 07:18:10,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3764430.0, ans=0.0 2024-08-18 07:18:28,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3764630.0, ans=0.0 2024-08-18 07:18:39,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4350, loss[loss=0.0836, beats_loss=0.01251, ecapa_loss=0.0001209, whisper_loss=0.06989, over 15988.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01062, ecapa_loss=0.0001455, whisper_loss=0.09041, over 3890423.17 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:18:40,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2024-08-18 07:18:49,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2024-08-18 07:19:08,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3764930.0, ans=0.125 2024-08-18 07:19:18,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.231e+01 2.499e+01 2.816e+01 6.235e+01, threshold=4.999e+01, percent-clipped=1.0 2024-08-18 07:19:26,827 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 07:19:40,944 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 07:19:43,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4400, loss[loss=0.1356, beats_loss=0.008805, ecapa_loss=0.0001378, whisper_loss=0.1255, over 23771.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001459, whisper_loss=0.091, over 3879215.07 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:19:46,141 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 14 from Vox, 44 fro AS 2024-08-18 07:19:57,494 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 07:20:05,402 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 11 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 07:20:09,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.34 vs. limit=15.0 2024-08-18 07:20:38,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3765630.0, ans=0.125 2024-08-18 07:20:41,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2024-08-18 07:20:47,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4450, loss[loss=0.1084, beats_loss=0.0107, ecapa_loss=0.0001487, whisper_loss=0.09619, over 22494.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001446, whisper_loss=0.09083, over 3897623.90 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:21:20,324 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 07:21:24,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3766030.0, ans=0.1 2024-08-18 07:21:26,927 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.341e+01 2.651e+01 2.929e+01 6.825e+01, threshold=5.301e+01, percent-clipped=1.0 2024-08-18 07:21:27,086 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-18 07:21:28,747 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.179e+01 2024-08-18 07:21:34,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-18 07:21:47,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3766130.0, ans=0.1 2024-08-18 07:21:51,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4500, loss[loss=0.1157, beats_loss=0.008121, ecapa_loss=0.0001362, whisper_loss=0.1062, over 14142.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001442, whisper_loss=0.0907, over 3881822.92 frames. ], batch size: 53, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:21:53,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3766230.0, ans=0.125 2024-08-18 07:21:54,008 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 07:22:08,465 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:22:10,657 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 15 from Vox, 49 fro AS 2024-08-18 07:22:11,819 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 07:22:23,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.01 vs. limit=22.5 2024-08-18 07:22:29,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3766530.0, ans=0.125 2024-08-18 07:22:34,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3766530.0, ans=0.125 2024-08-18 07:22:49,937 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.76 vs. limit=15.0 2024-08-18 07:22:55,337 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4550, loss[loss=0.1148, beats_loss=0.009817, ecapa_loss=0.0001575, whisper_loss=0.1034, over 22566.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.09103, over 3914366.08 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:23:35,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.270e+01 2.466e+01 2.642e+01 3.409e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-18 07:23:56,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3767130.0, ans=0.2 2024-08-18 07:23:59,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4600, loss[loss=0.081, beats_loss=0.0144, ecapa_loss=0.0001031, whisper_loss=0.06557, over 16938.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001439, whisper_loss=0.09057, over 3902262.24 frames. ], batch size: 69, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:24:00,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3767230.0, ans=0.125 2024-08-18 07:24:00,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.65 vs. limit=12.0 2024-08-18 07:24:10,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.84 vs. limit=15.0 2024-08-18 07:24:28,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3767430.0, ans=0.125 2024-08-18 07:24:58,469 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 07:25:04,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4650, loss[loss=0.1226, beats_loss=0.009458, ecapa_loss=0.0001673, whisper_loss=0.1115, over 23211.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.000144, whisper_loss=0.09024, over 3886674.40 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:25:38,706 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=7.956e-01 2024-08-18 07:25:43,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3768030.0, ans=0.0 2024-08-18 07:25:44,554 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.243e+01 2.444e+01 2.788e+01 4.840e+01, threshold=4.887e+01, percent-clipped=0.0 2024-08-18 07:25:47,209 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 07:25:52,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3768030.0, ans=0.1 2024-08-18 07:25:56,315 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 07:25:58,819 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 07:26:00,769 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.579e-03 2024-08-18 07:26:01,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3768130.0, ans=0.125 2024-08-18 07:26:02,745 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 07:26:04,029 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 07:26:06,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3768130.0, ans=0.09899494936611666 2024-08-18 07:26:07,823 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 07:26:08,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4700, loss[loss=0.1181, beats_loss=0.01122, ecapa_loss=0.0001438, whisper_loss=0.1054, over 22857.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.091, over 3890970.50 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:26:22,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3768330.0, ans=0.0 2024-08-18 07:26:24,816 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 07:26:26,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3768330.0, ans=0.125 2024-08-18 07:26:28,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3768330.0, ans=0.1 2024-08-18 07:26:33,572 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 07:26:41,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3768430.0, ans=0.125 2024-08-18 07:26:58,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3768530.0, ans=0.125 2024-08-18 07:27:03,980 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-18 07:27:05,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3768630.0, ans=0.125 2024-08-18 07:27:06,364 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 07:27:07,604 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 07:27:12,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4750, loss[loss=0.1057, beats_loss=0.00906, ecapa_loss=0.0001743, whisper_loss=0.09491, over 20435.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0105, ecapa_loss=0.0001448, whisper_loss=0.09105, over 3900715.37 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:27:29,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3768830.0, ans=0.1 2024-08-18 07:27:33,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3768830.0, ans=0.0 2024-08-18 07:27:36,746 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 07:27:38,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3768930.0, ans=0.09899494936611666 2024-08-18 07:27:51,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3769030.0, ans=0.125 2024-08-18 07:27:52,000 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.837e+01 2.283e+01 2.543e+01 2.859e+01 8.026e+01, threshold=5.085e+01, percent-clipped=1.0 2024-08-18 07:28:02,006 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 07:28:12,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2024-08-18 07:28:15,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4800, loss[loss=0.08314, beats_loss=0.01133, ecapa_loss=0.0001061, whisper_loss=0.07075, over 15449.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001467, whisper_loss=0.09063, over 3894446.89 frames. ], batch size: 58, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:28:20,793 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 07:28:21,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2024-08-18 07:28:25,873 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 07:28:40,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3769430.0, ans=0.125 2024-08-18 07:29:03,014 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 07:29:03,614 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.281e+00 2024-08-18 07:29:13,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3769630.0, ans=0.1 2024-08-18 07:29:15,757 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 07:29:19,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4850, loss[loss=0.09095, beats_loss=0.01276, ecapa_loss=0.0001404, whisper_loss=0.07679, over 22829.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001455, whisper_loss=0.09089, over 3905089.78 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:29:21,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3769730.0, ans=0.125 2024-08-18 07:29:27,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3769730.0, ans=0.0 2024-08-18 07:29:34,782 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 07:29:36,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3769830.0, ans=0.125 2024-08-18 07:29:38,222 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=10.0 2024-08-18 07:29:50,507 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 07:29:53,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3769930.0, ans=0.2 2024-08-18 07:29:59,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.248e+01 2.515e+01 2.808e+01 3.681e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-18 07:30:08,731 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-18 07:30:23,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4900, loss[loss=0.1109, beats_loss=0.01082, ecapa_loss=0.0001398, whisper_loss=0.09866, over 15474.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01057, ecapa_loss=0.0001461, whisper_loss=0.091, over 3887746.89 frames. ], batch size: 62, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:30:25,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3770230.0, ans=0.125 2024-08-18 07:30:33,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3770230.0, ans=0.125 2024-08-18 07:30:37,193 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:31:02,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3770530.0, ans=0.125 2024-08-18 07:31:10,162 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 07:31:13,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=12.0 2024-08-18 07:31:20,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3770630.0, ans=0.125 2024-08-18 07:31:20,853 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2024-08-18 07:31:26,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3770730.0, ans=0.025 2024-08-18 07:31:27,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3770730.0, ans=0.125 2024-08-18 07:31:27,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 4950, loss[loss=0.1143, beats_loss=0.008997, ecapa_loss=0.0001576, whisper_loss=0.1037, over 22961.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001449, whisper_loss=0.0903, over 3877942.93 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:31:28,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3770730.0, ans=0.1 2024-08-18 07:31:33,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3770730.0, ans=0.125 2024-08-18 07:31:38,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.38 vs. limit=15.0 2024-08-18 07:31:49,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.96 vs. limit=22.5 2024-08-18 07:31:53,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3770930.0, ans=0.125 2024-08-18 07:32:05,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3771030.0, ans=0.0 2024-08-18 07:32:06,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.529e+01 2.313e+01 2.600e+01 2.856e+01 8.113e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-18 07:32:31,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5000, loss[loss=0.116, beats_loss=0.01042, ecapa_loss=0.0001357, whisper_loss=0.1042, over 23624.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001461, whisper_loss=0.08998, over 3879830.30 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:32:34,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3771230.0, ans=15.0 2024-08-18 07:32:35,095 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 07:32:44,073 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 26 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 07:32:47,985 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 07:33:09,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3771530.0, ans=0.2 2024-08-18 07:33:13,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3771530.0, ans=0.125 2024-08-18 07:33:14,462 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-18 07:33:19,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.41 vs. limit=22.5 2024-08-18 07:33:25,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3771630.0, ans=0.2 2024-08-18 07:33:29,027 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 07:33:33,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3771630.0, ans=0.1 2024-08-18 07:33:35,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5050, loss[loss=0.1071, beats_loss=0.01087, ecapa_loss=0.0001673, whisper_loss=0.09453, over 22857.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01066, ecapa_loss=0.0001452, whisper_loss=0.08994, over 3882783.04 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:33:51,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3771830.0, ans=0.0 2024-08-18 07:33:53,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3771830.0, ans=0.0 2024-08-18 07:34:15,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.257e+01 2.473e+01 2.747e+01 4.109e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 07:34:22,093 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 07:34:24,496 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 07:34:27,008 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 07:34:30,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3772130.0, ans=0.0 2024-08-18 07:34:36,573 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2024-08-18 07:34:39,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5100, loss[loss=0.09113, beats_loss=0.01164, ecapa_loss=0.0001406, whisper_loss=0.07808, over 14592.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01068, ecapa_loss=0.0001452, whisper_loss=0.08949, over 3873327.09 frames. ], batch size: 59, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:34:56,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3772330.0, ans=0.2 2024-08-18 07:34:57,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=12.0 2024-08-18 07:35:08,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3772430.0, ans=0.1 2024-08-18 07:35:30,024 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 07:35:40,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3772630.0, ans=0.125 2024-08-18 07:35:43,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5150, loss[loss=0.1048, beats_loss=0.01182, ecapa_loss=0.000139, whisper_loss=0.09159, over 22584.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01064, ecapa_loss=0.0001455, whisper_loss=0.08996, over 3884017.12 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:35:47,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3772730.0, ans=0.0 2024-08-18 07:36:12,515 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06933695822954178, model_norm_threshold=49.455631256103516 2024-08-18 07:36:12,674 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.293e+04, grad_sumsq=7.293e+04, orig_rms_sq=1.000e+00 2024-08-18 07:36:15,589 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 07:36:24,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.306e+01 2.541e+01 2.832e+01 7.133e+02, threshold=5.082e+01, percent-clipped=3.0 2024-08-18 07:36:30,498 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 07:36:33,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3773030.0, ans=0.0 2024-08-18 07:36:38,924 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 07:36:41,544 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 07:36:44,158 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 07:36:46,705 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 07:36:47,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3773130.0, ans=0.125 2024-08-18 07:36:48,911 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5200, loss[loss=0.07166, beats_loss=0.01185, ecapa_loss=0.0001559, whisper_loss=0.05826, over 18960.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001449, whisper_loss=0.08992, over 3872347.29 frames. ], batch size: 78, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:36:50,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3773230.0, ans=0.125 2024-08-18 07:37:16,680 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 07:37:21,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3773430.0, ans=0.125 2024-08-18 07:37:36,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3773530.0, ans=0.0 2024-08-18 07:37:43,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3773630.0, ans=0.2 2024-08-18 07:37:46,884 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 07:37:47,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=12.0 2024-08-18 07:37:54,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5250, loss[loss=0.1177, beats_loss=0.007225, ecapa_loss=0.0001824, whisper_loss=0.1086, over 17659.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001468, whisper_loss=0.09045, over 3866312.57 frames. ], batch size: 73, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:38:04,023 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=12.0 2024-08-18 07:38:19,864 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 07:38:29,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-18 07:38:39,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.417e+01 2.630e+01 2.894e+01 3.805e+02, threshold=5.260e+01, percent-clipped=2.0 2024-08-18 07:39:03,258 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:39:05,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5300, loss[loss=0.1152, beats_loss=0.01083, ecapa_loss=0.0001437, whisper_loss=0.103, over 19545.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001466, whisper_loss=0.09111, over 3899033.70 frames. ], batch size: 81, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:39:06,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3774230.0, ans=0.015 2024-08-18 07:39:16,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3774230.0, ans=0.125 2024-08-18 07:39:25,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3774330.0, ans=0.0 2024-08-18 07:39:29,670 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 07:39:59,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3774530.0, ans=0.0 2024-08-18 07:40:05,209 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=9.595e-02 2024-08-18 07:40:15,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5350, loss[loss=0.1052, beats_loss=0.01101, ecapa_loss=0.0001171, whisper_loss=0.093, over 22671.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.000145, whisper_loss=0.0911, over 3890004.18 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:40:21,859 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 07:40:22,532 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.93 vs. limit=15.0 2024-08-18 07:40:24,310 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 07:40:26,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3774730.0, ans=0.0 2024-08-18 07:40:28,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3774830.0, ans=10.0 2024-08-18 07:40:32,246 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 07:40:32,711 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.444e+00 2024-08-18 07:40:41,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3774930.0, ans=0.1 2024-08-18 07:40:55,420 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.297e+01 2.541e+01 2.871e+01 2.091e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-18 07:41:01,676 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 07:41:05,488 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 07:41:18,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3775230.0, ans=0.125 2024-08-18 07:41:19,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5400, loss[loss=0.09066, beats_loss=0.01212, ecapa_loss=0.0001271, whisper_loss=0.07728, over 16389.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001438, whisper_loss=0.09063, over 3907087.72 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 07:41:20,624 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 07:41:24,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3775230.0, ans=0.2 2024-08-18 07:41:33,615 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 07:41:37,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3775330.0, ans=0.125 2024-08-18 07:41:38,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3775330.0, ans=0.0 2024-08-18 07:41:46,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3775430.0, ans=0.125 2024-08-18 07:42:23,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5450, loss[loss=0.1359, beats_loss=0.008718, ecapa_loss=0.0001368, whisper_loss=0.1258, over 20116.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01057, ecapa_loss=0.0001438, whisper_loss=0.09072, over 3901328.16 frames. ], batch size: 76, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:42:36,158 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-08-18 07:42:44,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3775830.0, ans=0.125 2024-08-18 07:42:50,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3775930.0, ans=0.0 2024-08-18 07:42:51,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-18 07:42:54,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2024-08-18 07:42:56,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3775930.0, ans=0.125 2024-08-18 07:42:56,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3775930.0, ans=0.025 2024-08-18 07:42:56,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.07 vs. limit=5.0 2024-08-18 07:43:01,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3776030.0, ans=0.125 2024-08-18 07:43:03,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.317e+01 2.510e+01 2.748e+01 4.518e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 07:43:07,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3776030.0, ans=0.125 2024-08-18 07:43:08,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3776030.0, ans=0.0 2024-08-18 07:43:27,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5500, loss[loss=0.1134, beats_loss=0.009695, ecapa_loss=0.0001275, whisper_loss=0.1025, over 23102.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001434, whisper_loss=0.09074, over 3912275.37 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:43:32,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3776230.0, ans=0.0 2024-08-18 07:43:35,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3776230.0, ans=0.125 2024-08-18 07:43:40,905 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-18 07:44:05,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3776530.0, ans=0.125 2024-08-18 07:44:06,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3776530.0, ans=0.0 2024-08-18 07:44:07,443 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 07:44:11,557 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-08-18 07:44:13,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3776530.0, ans=0.125 2024-08-18 07:44:28,226 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 20 from LS+wenet, 23 from Vox, 51 fro AS 2024-08-18 07:44:29,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5550, loss[loss=0.07726, beats_loss=0.01443, ecapa_loss=0.000133, whisper_loss=0.0615, over 21798.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01057, ecapa_loss=0.0001433, whisper_loss=0.09106, over 3929738.93 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:44:36,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3776730.0, ans=0.0 2024-08-18 07:44:41,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3776830.0, ans=0.0 2024-08-18 07:44:54,464 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 07:45:08,291 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.307e+01 2.560e+01 2.922e+01 7.758e+01, threshold=5.121e+01, percent-clipped=1.0 2024-08-18 07:45:16,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2024-08-18 07:45:25,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=15.0 2024-08-18 07:45:27,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3777130.0, ans=0.125 2024-08-18 07:45:29,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2024-08-18 07:45:31,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5600, loss[loss=0.09256, beats_loss=0.0107, ecapa_loss=0.0001357, whisper_loss=0.0805, over 17090.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01059, ecapa_loss=0.0001439, whisper_loss=0.09126, over 3936253.29 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:46:00,464 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 07:46:03,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-18 07:46:07,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3777430.0, ans=0.1 2024-08-18 07:46:19,120 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 11 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 07:46:19,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3777530.0, ans=0.125 2024-08-18 07:46:21,936 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 07:46:33,458 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5650, loss[loss=0.1038, beats_loss=0.01052, ecapa_loss=0.0001469, whisper_loss=0.09178, over 20251.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0106, ecapa_loss=0.0001441, whisper_loss=0.09083, over 3946950.85 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:46:44,798 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 07:46:46,014 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 07:47:12,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.355e+01 2.678e+01 2.961e+01 4.241e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-18 07:47:21,075 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 14 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 07:47:21,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.17 vs. limit=22.5 2024-08-18 07:47:24,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3778130.0, ans=0.125 2024-08-18 07:47:33,767 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 07:47:34,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3778230.0, ans=0.125 2024-08-18 07:47:35,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5700, loss[loss=0.1128, beats_loss=0.005355, ecapa_loss=0.0002392, whisper_loss=0.1051, over 15404.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0106, ecapa_loss=0.0001458, whisper_loss=0.0909, over 3938991.02 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:47:46,312 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 07:48:00,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-08-18 07:48:25,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3778530.0, ans=0.2 2024-08-18 07:48:56,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5750, loss[loss=0.1081, beats_loss=0.01179, ecapa_loss=0.0001183, whisper_loss=0.09507, over 23853.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.0001462, whisper_loss=0.09124, over 3940264.35 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:49:05,210 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 07:49:26,146 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 07:49:53,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3779030.0, ans=0.0 2024-08-18 07:49:54,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.765e+01 2.413e+01 2.637e+01 2.878e+01 7.764e+01, threshold=5.274e+01, percent-clipped=1.0 2024-08-18 07:50:14,831 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 22 from LS+wenet, 23 from Vox, 50 fro AS 2024-08-18 07:50:28,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5800, loss[loss=0.1101, beats_loss=0.007178, ecapa_loss=0.0001798, whisper_loss=0.1011, over 16994.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01048, ecapa_loss=0.000147, whisper_loss=0.09079, over 3906628.57 frames. ], batch size: 69, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:50:39,449 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 07:50:56,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-08-18 07:51:26,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3779530.0, ans=0.125 2024-08-18 07:51:27,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3779530.0, ans=0.0 2024-08-18 07:51:37,287 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 07:51:41,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3779530.0, ans=0.125 2024-08-18 07:52:03,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=12.0 2024-08-18 07:52:04,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5850, loss[loss=0.09483, beats_loss=0.01239, ecapa_loss=0.0001073, whisper_loss=0.08137, over 23495.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.000146, whisper_loss=0.09027, over 3924439.08 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:52:29,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3779830.0, ans=0.0 2024-08-18 07:52:35,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3779930.0, ans=0.0 2024-08-18 07:52:36,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-18 07:52:38,625 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 07:52:40,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3779930.0, ans=0.125 2024-08-18 07:52:50,345 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.256e+01 2.447e+01 2.728e+01 3.298e+01, threshold=4.893e+01, percent-clipped=0.0 2024-08-18 07:52:53,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3780030.0, ans=0.125 2024-08-18 07:53:10,613 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 07:53:15,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3780130.0, ans=0.2 2024-08-18 07:53:17,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5900, loss[loss=0.08399, beats_loss=0.009598, ecapa_loss=0.000163, whisper_loss=0.07276, over 15147.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001465, whisper_loss=0.09004, over 3877494.83 frames. ], batch size: 63, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:53:33,863 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 07:53:35,224 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 07:53:40,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3780330.0, ans=0.125 2024-08-18 07:53:52,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3780430.0, ans=0.04949747468305833 2024-08-18 07:54:11,989 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3780530.0, ans=0.125 2024-08-18 07:54:12,993 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 07:54:18,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2024-08-18 07:54:23,483 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-18 07:54:30,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 5950, loss[loss=0.09918, beats_loss=0.008752, ecapa_loss=0.0001274, whisper_loss=0.08915, over 21609.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001461, whisper_loss=0.09068, over 3917361.31 frames. ], batch size: 83, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:54:42,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3780730.0, ans=0.0 2024-08-18 07:54:42,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3780730.0, ans=0.0 2024-08-18 07:55:06,696 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 07:55:10,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3780930.0, ans=0.125 2024-08-18 07:55:10,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=22.5 2024-08-18 07:55:15,691 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.895e+01 2.332e+01 2.559e+01 2.973e+01 4.806e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-18 07:55:17,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3781030.0, ans=0.2 2024-08-18 07:55:23,625 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-18 07:55:32,667 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 07:55:41,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=3781130.0, ans=15.0 2024-08-18 07:55:45,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6000, loss[loss=0.1108, beats_loss=0.007544, ecapa_loss=0.0001442, whisper_loss=0.1018, over 21701.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001464, whisper_loss=0.09053, over 3929748.92 frames. ], batch size: 83, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:55:45,366 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 07:56:22,541 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on ASR_libri: loss=0.2523, beats_loss=0, ecapa_loss=0.0005188, whisper_loss=0.2471, over 922467.00 frames. 2024-08-18 07:56:37,623 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on SV_voxceleb1: loss=0.004005, beats_loss=0, ecapa_loss=0.0004005, whisper_loss=0, over 939242.00 frames. 2024-08-18 07:58:26,856 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 07:58:26,860 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 07:58:27,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3781230.0, ans=0.125 2024-08-18 07:58:32,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3781230.0, ans=0.5 2024-08-18 07:58:36,725 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 07:58:39,701 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-18 07:58:43,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3781330.0, ans=0.125 2024-08-18 07:58:44,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3781330.0, ans=0.0 2024-08-18 07:58:53,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3781330.0, ans=0.125 2024-08-18 07:58:56,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3781430.0, ans=0.0 2024-08-18 07:59:37,899 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 07:59:39,129 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6050, loss[loss=0.09705, beats_loss=0.01153, ecapa_loss=0.000163, whisper_loss=0.08388, over 21503.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001452, whisper_loss=0.09054, over 3928859.89 frames. ], batch size: 89, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 07:59:57,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3781830.0, ans=0.125 2024-08-18 08:00:04,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2024-08-18 08:00:08,427 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 08:00:22,642 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.283e+01 2.534e+01 2.790e+01 4.412e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-18 08:00:28,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3782030.0, ans=0.125 2024-08-18 08:00:28,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3782030.0, ans=0.025 2024-08-18 08:00:35,120 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 35 from Vox, 30 fro AS 2024-08-18 08:00:39,359 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-18 08:00:43,834 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 08:00:48,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3782230.0, ans=0.125 2024-08-18 08:00:49,560 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6100, loss[loss=0.0963, beats_loss=0.01113, ecapa_loss=0.0001644, whisper_loss=0.08352, over 20920.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001453, whisper_loss=0.08963, over 3900892.26 frames. ], batch size: 87, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:00:56,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3782230.0, ans=0.0 2024-08-18 08:01:11,300 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.93 vs. limit=15.0 2024-08-18 08:01:17,487 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 08:01:51,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3782630.0, ans=0.125 2024-08-18 08:01:54,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.60 vs. limit=10.0 2024-08-18 08:01:57,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3782630.0, ans=0.2 2024-08-18 08:02:02,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6150, loss[loss=0.11, beats_loss=0.009523, ecapa_loss=0.0001421, whisper_loss=0.09904, over 17423.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001451, whisper_loss=0.08995, over 3897987.88 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:02:03,245 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.355e-02 2024-08-18 08:02:21,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3782830.0, ans=0.5 2024-08-18 08:02:27,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3782830.0, ans=0.125 2024-08-18 08:02:28,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3782830.0, ans=0.125 2024-08-18 08:02:28,886 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.886e+05 2024-08-18 08:02:30,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3782930.0, ans=0.0 2024-08-18 08:02:31,228 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 08:02:32,515 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 08:02:46,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.364e+01 2.615e+01 3.104e+01 2.704e+02, threshold=5.230e+01, percent-clipped=2.0 2024-08-18 08:02:52,153 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 08:02:56,063 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 08:02:56,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=3783030.0, ans=6.0 2024-08-18 08:03:10,714 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 08:03:13,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6200, loss[loss=0.1062, beats_loss=0.01197, ecapa_loss=0.0001332, whisper_loss=0.0929, over 22386.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.000145, whisper_loss=0.08993, over 3903756.62 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:03:15,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3783230.0, ans=0.125 2024-08-18 08:03:23,441 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 08:03:32,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3783330.0, ans=0.2 2024-08-18 08:03:36,693 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:03:40,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3783430.0, ans=0.125 2024-08-18 08:03:53,598 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 08:04:01,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3783530.0, ans=0.125 2024-08-18 08:04:04,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3783530.0, ans=0.1 2024-08-18 08:04:05,326 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-18 08:04:07,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3783530.0, ans=0.125 2024-08-18 08:04:07,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3783530.0, ans=0.0 2024-08-18 08:04:12,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2024-08-18 08:04:23,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6250, loss[loss=0.08795, beats_loss=0.01299, ecapa_loss=0.0001487, whisper_loss=0.07347, over 18931.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01066, ecapa_loss=0.0001453, whisper_loss=0.0902, over 3928692.79 frames. ], batch size: 80, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:04:55,342 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.212e+01 2024-08-18 08:05:04,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.926e+01 2.253e+01 2.525e+01 2.867e+01 3.662e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-18 08:05:12,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-18 08:05:14,974 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 08:05:16,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3784130.0, ans=10.0 2024-08-18 08:05:20,301 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 08:05:30,940 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6300, loss[loss=0.096, beats_loss=0.01178, ecapa_loss=0.0001308, whisper_loss=0.08291, over 21904.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001465, whisper_loss=0.09025, over 3893640.41 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:05:35,177 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 08:05:42,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3784230.0, ans=0.2 2024-08-18 08:05:43,519 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 08:05:50,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3784330.0, ans=0.1 2024-08-18 08:05:50,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.12 vs. limit=15.0 2024-08-18 08:06:39,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3784630.0, ans=0.0 2024-08-18 08:06:41,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6350, loss[loss=0.104, beats_loss=0.00951, ecapa_loss=0.0001249, whisper_loss=0.09322, over 22792.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.0001469, whisper_loss=0.08992, over 3893843.26 frames. ], batch size: 86, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:06:45,364 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 08:07:08,462 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 08:07:10,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.54 vs. limit=22.5 2024-08-18 08:07:22,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3785030.0, ans=0.0 2024-08-18 08:07:24,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.986e+01 2.408e+01 2.616e+01 2.938e+01 2.370e+02, threshold=5.231e+01, percent-clipped=3.0 2024-08-18 08:07:51,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6400, loss[loss=0.1158, beats_loss=0.008979, ecapa_loss=0.0001632, whisper_loss=0.1052, over 13911.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001467, whisper_loss=0.09069, over 3908470.34 frames. ], batch size: 54, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:08:07,429 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-18 08:08:09,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3785330.0, ans=0.1 2024-08-18 08:08:09,926 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.694e+01 2024-08-18 08:08:20,137 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 08:08:27,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3785430.0, ans=0.0 2024-08-18 08:08:27,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3785430.0, ans=0.0 2024-08-18 08:08:28,594 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-18 08:08:32,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3785430.0, ans=0.125 2024-08-18 08:09:02,637 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6450, loss[loss=0.101, beats_loss=0.01228, ecapa_loss=0.0001204, whisper_loss=0.08749, over 19749.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01055, ecapa_loss=0.0001466, whisper_loss=0.0907, over 3934465.05 frames. ], batch size: 74, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:09:04,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3785730.0, ans=0.0 2024-08-18 08:09:21,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3785830.0, ans=0.125 2024-08-18 08:09:46,280 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.303e+01 2.568e+01 2.899e+01 1.758e+02, threshold=5.135e+01, percent-clipped=1.0 2024-08-18 08:09:49,239 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 08:09:53,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=15.0 2024-08-18 08:09:59,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=3786130.0, ans=15.0 2024-08-18 08:10:10,574 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-18 08:10:15,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6500, loss[loss=0.1072, beats_loss=0.0113, ecapa_loss=0.000147, whisper_loss=0.09446, over 22552.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001461, whisper_loss=0.09093, over 3941175.04 frames. ], batch size: 91, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:10:24,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3786230.0, ans=0.0 2024-08-18 08:10:24,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3786230.0, ans=0.125 2024-08-18 08:10:48,485 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2024-08-18 08:11:04,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3786530.0, ans=0.0 2024-08-18 08:11:06,013 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:11:06,978 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 08:11:20,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3786630.0, ans=0.0 2024-08-18 08:11:25,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3786730.0, ans=0.0 2024-08-18 08:11:26,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6550, loss[loss=0.09129, beats_loss=0.01073, ecapa_loss=0.0001383, whisper_loss=0.07918, over 18546.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01064, ecapa_loss=0.000145, whisper_loss=0.09079, over 3962047.78 frames. ], batch size: 76, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:11:50,012 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 08:11:54,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3786930.0, ans=22.5 2024-08-18 08:12:09,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+01 2.440e+01 2.660e+01 3.012e+01 4.611e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-18 08:12:16,256 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 22 from Vox, 18 fro AS 2024-08-18 08:12:27,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3787130.0, ans=0.125 2024-08-18 08:12:32,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3787130.0, ans=0.125 2024-08-18 08:12:37,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6600, loss[loss=0.1009, beats_loss=0.01164, ecapa_loss=0.0001393, whisper_loss=0.08789, over 22595.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01052, ecapa_loss=0.0001462, whisper_loss=0.0915, over 3981447.55 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:12:46,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787230.0, ans=0.1 2024-08-18 08:12:54,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3787330.0, ans=0.125 2024-08-18 08:12:54,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.68 vs. limit=10.0 2024-08-18 08:12:56,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3787330.0, ans=0.1 2024-08-18 08:13:06,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-18 08:13:09,684 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 08:13:28,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3787530.0, ans=0.1 2024-08-18 08:13:31,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3787530.0, ans=0.0 2024-08-18 08:13:34,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3787630.0, ans=0.0 2024-08-18 08:13:45,219 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 08:13:45,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3787630.0, ans=0.2 2024-08-18 08:13:48,030 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6650, loss[loss=0.09796, beats_loss=0.01218, ecapa_loss=0.0001341, whisper_loss=0.08444, over 22549.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01058, ecapa_loss=0.0001454, whisper_loss=0.09107, over 3977744.24 frames. ], batch size: 94, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:13:53,395 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 08:14:27,708 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 08:14:31,532 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.309e+01 2.540e+01 2.799e+01 5.437e+01, threshold=5.081e+01, percent-clipped=1.0 2024-08-18 08:14:32,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3788030.0, ans=0.0 2024-08-18 08:14:59,192 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6700, loss[loss=0.1093, beats_loss=0.009419, ecapa_loss=0.0001421, whisper_loss=0.09849, over 16110.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09076, over 3942256.41 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:15:04,690 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 11 from Vox, 33 fro AS 2024-08-18 08:15:11,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3788230.0, ans=0.125 2024-08-18 08:15:22,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3788330.0, ans=0.125 2024-08-18 08:15:36,612 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 08:15:42,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3788530.0, ans=0.2 2024-08-18 08:15:42,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3788530.0, ans=0.125 2024-08-18 08:16:00,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3788630.0, ans=0.0 2024-08-18 08:16:04,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3788630.0, ans=0.125 2024-08-18 08:16:08,897 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6750, loss[loss=0.1397, beats_loss=0.005687, ecapa_loss=0.0001676, whisper_loss=0.1324, over 15209.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001465, whisper_loss=0.09107, over 3916493.63 frames. ], batch size: 56, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:16:13,688 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 08:16:22,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3788830.0, ans=0.1 2024-08-18 08:16:35,772 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 08:16:39,578 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 08:16:52,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.278e+01 2.601e+01 3.008e+01 1.413e+02, threshold=5.202e+01, percent-clipped=1.0 2024-08-18 08:17:00,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3789030.0, ans=0.125 2024-08-18 08:17:18,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6800, loss[loss=0.1012, beats_loss=0.01075, ecapa_loss=0.000158, whisper_loss=0.08882, over 22038.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01054, ecapa_loss=0.0001467, whisper_loss=0.0909, over 3906842.84 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:17:27,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3789230.0, ans=0.1 2024-08-18 08:17:28,091 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 08:17:40,309 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 08:17:44,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=22.5 2024-08-18 08:17:58,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=3789430.0, ans=0.2 2024-08-18 08:18:12,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3789530.0, ans=0.0 2024-08-18 08:18:14,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3789630.0, ans=0.0 2024-08-18 08:18:16,983 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 08:18:27,561 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6850, loss[loss=0.1032, beats_loss=0.011, ecapa_loss=0.0001272, whisper_loss=0.09093, over 15670.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001472, whisper_loss=0.09096, over 3889274.59 frames. ], batch size: 60, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:18:29,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3789730.0, ans=0.125 2024-08-18 08:18:34,186 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 08:18:37,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3789730.0, ans=0.125 2024-08-18 08:18:45,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3789830.0, ans=0.2 2024-08-18 08:18:51,747 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 08:19:07,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3790030.0, ans=0.2 2024-08-18 08:19:09,386 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.277e+01 2.530e+01 2.768e+01 4.068e+01, threshold=5.060e+01, percent-clipped=0.0 2024-08-18 08:19:33,747 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 08:19:36,691 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6900, loss[loss=0.08537, beats_loss=0.01035, ecapa_loss=0.0001482, whisper_loss=0.07354, over 20761.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01054, ecapa_loss=0.0001453, whisper_loss=0.09113, over 3898234.26 frames. ], batch size: 87, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:19:45,079 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-18 08:19:57,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2024-08-18 08:20:06,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3790430.0, ans=0.2 2024-08-18 08:20:17,850 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2024-08-18 08:20:18,862 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 08:20:19,910 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 08:20:22,357 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 13 from Vox, 42 fro AS 2024-08-18 08:20:27,136 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.424e+05 2024-08-18 08:20:45,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-08-18 08:20:45,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 6950, loss[loss=0.1151, beats_loss=0.009693, ecapa_loss=0.0001436, whisper_loss=0.104, over 22757.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.0001452, whisper_loss=0.09095, over 3892089.19 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:20:51,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3790730.0, ans=0.125 2024-08-18 08:20:53,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3790730.0, ans=0.5 2024-08-18 08:21:10,174 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.62 vs. limit=15.0 2024-08-18 08:21:12,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3790930.0, ans=0.0 2024-08-18 08:21:18,806 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 08:21:21,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3790930.0, ans=0.0 2024-08-18 08:21:27,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=3791030.0, ans=0.04949747468305833 2024-08-18 08:21:28,900 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.281e+01 2.543e+01 2.749e+01 3.905e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-18 08:21:45,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3791130.0, ans=0.125 2024-08-18 08:21:51,786 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 08:21:54,203 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7000, loss[loss=0.0862, beats_loss=0.01084, ecapa_loss=0.0001423, whisper_loss=0.07394, over 16487.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01066, ecapa_loss=0.0001449, whisper_loss=0.09051, over 3910129.00 frames. ], batch size: 64, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:21:56,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-18 08:22:24,993 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 08:22:37,384 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 08:22:53,482 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 08:22:59,586 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.752e+05 2024-08-18 08:23:04,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7050, loss[loss=0.0955, beats_loss=0.01121, ecapa_loss=0.0001266, whisper_loss=0.08302, over 23417.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01065, ecapa_loss=0.0001444, whisper_loss=0.09086, over 3913145.12 frames. ], batch size: 93, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:23:21,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=3791830.0, ans=15.0 2024-08-18 08:23:28,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3791830.0, ans=0.0 2024-08-18 08:23:31,211 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 28 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 08:23:36,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3791930.0, ans=0.125 2024-08-18 08:23:39,654 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 08:23:47,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.689e+01 2.340e+01 2.598e+01 2.847e+01 9.151e+01, threshold=5.195e+01, percent-clipped=1.0 2024-08-18 08:23:52,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2024-08-18 08:24:07,374 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 08:24:10,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3792130.0, ans=0.125 2024-08-18 08:24:13,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7100, loss[loss=0.1225, beats_loss=0.01041, ecapa_loss=0.0001419, whisper_loss=0.1107, over 22462.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001446, whisper_loss=0.09075, over 3878507.96 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:24:26,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-08-18 08:24:38,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3792330.0, ans=0.0 2024-08-18 08:24:51,655 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 08:24:57,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3792430.0, ans=0.125 2024-08-18 08:25:10,792 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 08:25:20,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3792630.0, ans=0.09899494936611666 2024-08-18 08:25:29,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7150, loss[loss=0.09778, beats_loss=0.01111, ecapa_loss=0.0001212, whisper_loss=0.08546, over 20748.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001447, whisper_loss=0.09029, over 3895622.99 frames. ], batch size: 81, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:26:15,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=3793030.0, ans=0.0 2024-08-18 08:26:15,860 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.780e+01 2.213e+01 2.415e+01 2.691e+01 1.069e+02, threshold=4.830e+01, percent-clipped=1.0 2024-08-18 08:26:18,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3793030.0, ans=0.125 2024-08-18 08:26:25,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3793030.0, ans=0.1 2024-08-18 08:26:29,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3793130.0, ans=0.0 2024-08-18 08:26:44,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7200, loss[loss=0.09675, beats_loss=0.01017, ecapa_loss=0.00011, whisper_loss=0.08549, over 17620.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001447, whisper_loss=0.09003, over 3876121.63 frames. ], batch size: 66, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:26:47,329 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 08:26:52,351 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 08:26:54,994 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 08:27:29,917 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 08:27:42,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3793530.0, ans=0.0 2024-08-18 08:27:47,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3793630.0, ans=0.125 2024-08-18 08:28:03,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7250, loss[loss=0.09415, beats_loss=0.01138, ecapa_loss=0.0001325, whisper_loss=0.08144, over 22066.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01061, ecapa_loss=0.0001446, whisper_loss=0.08983, over 3869937.80 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:28:23,739 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 08:28:34,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3793830.0, ans=0.125 2024-08-18 08:28:43,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2024-08-18 08:28:46,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3793930.0, ans=0.025 2024-08-18 08:28:48,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3793930.0, ans=0.025 2024-08-18 08:28:55,563 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.291e+01 2.562e+01 2.924e+01 4.395e+01, threshold=5.124e+01, percent-clipped=0.0 2024-08-18 08:29:21,113 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2024-08-18 08:29:23,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3794130.0, ans=0.0 2024-08-18 08:29:24,807 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 08:29:26,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7300, loss[loss=0.09528, beats_loss=0.01042, ecapa_loss=0.0001619, whisper_loss=0.08324, over 16309.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001444, whisper_loss=0.09007, over 3886546.68 frames. ], batch size: 67, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:30:08,914 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 33 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 08:30:11,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3794530.0, ans=0.1 2024-08-18 08:30:39,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7350, loss[loss=0.1066, beats_loss=0.01037, ecapa_loss=0.0001169, whisper_loss=0.09508, over 22550.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01058, ecapa_loss=0.0001447, whisper_loss=0.08994, over 3846115.36 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:30:39,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3794730.0, ans=0.1 2024-08-18 08:30:42,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3794730.0, ans=0.2 2024-08-18 08:31:00,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3794830.0, ans=0.0 2024-08-18 08:31:10,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.94 vs. limit=15.0 2024-08-18 08:31:21,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.256e+01 2.556e+01 3.026e+01 2.430e+02, threshold=5.112e+01, percent-clipped=3.0 2024-08-18 08:31:25,111 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-18 08:31:43,422 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 08:31:48,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7400, loss[loss=0.08455, beats_loss=0.01258, ecapa_loss=0.0001691, whisper_loss=0.07028, over 20826.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001462, whisper_loss=0.09005, over 3879778.61 frames. ], batch size: 92, lr: 2.36e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:32:16,037 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 08:32:18,725 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 08:32:22,601 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 08:32:25,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3795430.0, ans=0.0 2024-08-18 08:32:28,501 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 08:32:59,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3795730.0, ans=0.125 2024-08-18 08:33:00,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7450, loss[loss=0.09551, beats_loss=0.0113, ecapa_loss=0.0001537, whisper_loss=0.08267, over 20700.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001471, whisper_loss=0.09041, over 3898603.25 frames. ], batch size: 88, lr: 2.36e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:33:10,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3795730.0, ans=0.125 2024-08-18 08:33:36,869 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 08:33:38,027 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 08:33:45,458 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.298e+01 2.515e+01 2.831e+01 4.675e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-18 08:33:45,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3796030.0, ans=0.09899494936611666 2024-08-18 08:33:50,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796030.0, ans=0.1 2024-08-18 08:33:53,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3796030.0, ans=0.125 2024-08-18 08:33:58,132 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.93 vs. limit=15.0 2024-08-18 08:34:07,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3796130.0, ans=0.125 2024-08-18 08:34:12,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7500, loss[loss=0.1043, beats_loss=0.01153, ecapa_loss=0.0001363, whisper_loss=0.09146, over 22678.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001465, whisper_loss=0.08973, over 3872300.05 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:34:14,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796230.0, ans=0.1 2024-08-18 08:34:18,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3796230.0, ans=0.1 2024-08-18 08:34:28,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3796330.0, ans=0.0 2024-08-18 08:34:29,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3796330.0, ans=0.1 2024-08-18 08:34:53,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3796530.0, ans=0.0 2024-08-18 08:35:09,674 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.823e+05 2024-08-18 08:35:10,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3796630.0, ans=0.125 2024-08-18 08:35:15,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3796630.0, ans=0.125 2024-08-18 08:35:15,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-18 08:35:21,332 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7550, loss[loss=0.08272, beats_loss=0.01271, ecapa_loss=0.0001487, whisper_loss=0.06852, over 21029.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001464, whisper_loss=0.09025, over 3855596.93 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:35:37,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3796830.0, ans=0.125 2024-08-18 08:35:53,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=3796930.0, ans=15.0 2024-08-18 08:36:01,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.289e+01 2.559e+01 2.836e+01 1.504e+02, threshold=5.118e+01, percent-clipped=1.0 2024-08-18 08:36:03,998 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04845663160085678, model_norm_threshold=51.1837158203125 2024-08-18 08:36:04,160 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.631e+05, grad_sumsq=1.588e+07, orig_rms_sq=1.027e-02 2024-08-18 08:36:09,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3797030.0, ans=0.125 2024-08-18 08:36:13,318 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-18 08:36:26,281 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7600, loss[loss=0.08887, beats_loss=0.01238, ecapa_loss=9.686e-05, whisper_loss=0.07552, over 16321.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001461, whisper_loss=0.09004, over 3843923.61 frames. ], batch size: 62, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 08:36:26,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3797230.0, ans=0.2 2024-08-18 08:36:28,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3797230.0, ans=0.125 2024-08-18 08:36:41,599 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 08:36:47,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2024-08-18 08:36:54,199 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 08:37:00,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=3797430.0, ans=0.5 2024-08-18 08:37:10,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3797530.0, ans=0.125 2024-08-18 08:37:11,212 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 08:37:15,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3797530.0, ans=22.5 2024-08-18 08:37:16,025 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 08:37:18,410 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 08:37:18,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3797530.0, ans=0.1 2024-08-18 08:37:21,430 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 08:37:29,666 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 08:37:35,299 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7650, loss[loss=0.09763, beats_loss=0.01381, ecapa_loss=0.0001216, whisper_loss=0.0826, over 17165.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001448, whisper_loss=0.09058, over 3889428.95 frames. ], batch size: 69, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:37:35,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3797730.0, ans=0.2 2024-08-18 08:37:43,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3797730.0, ans=0.1 2024-08-18 08:37:47,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3797830.0, ans=0.1 2024-08-18 08:37:51,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3797830.0, ans=0.2 2024-08-18 08:37:57,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3797830.0, ans=0.0 2024-08-18 08:37:59,177 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.060e-02 2024-08-18 08:38:01,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3797930.0, ans=0.125 2024-08-18 08:38:14,689 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 08:38:18,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.494e+01 2.701e+01 3.087e+01 1.056e+03, threshold=5.401e+01, percent-clipped=1.0 2024-08-18 08:38:37,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=12.0 2024-08-18 08:38:41,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7700, loss[loss=0.1087, beats_loss=0.01071, ecapa_loss=0.0001514, whisper_loss=0.09651, over 21945.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001438, whisper_loss=0.09015, over 3921164.75 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:38:43,686 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 08:39:02,314 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 08:39:05,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3798430.0, ans=0.0 2024-08-18 08:39:06,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3798430.0, ans=0.2 2024-08-18 08:39:07,579 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 08:39:09,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3798430.0, ans=0.1 2024-08-18 08:39:12,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-18 08:39:15,191 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 30 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 08:39:45,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7750, loss[loss=0.1002, beats_loss=0.01126, ecapa_loss=0.0001552, whisper_loss=0.08737, over 16551.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001443, whisper_loss=0.09071, over 3925050.35 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:39:47,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3798730.0, ans=0.0 2024-08-18 08:39:51,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-08-18 08:39:58,450 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 08:39:58,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3798830.0, ans=0.125 2024-08-18 08:40:07,777 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 08:40:09,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3798830.0, ans=0.2 2024-08-18 08:40:14,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3798930.0, ans=0.2 2024-08-18 08:40:18,117 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 08:40:23,573 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 08:40:27,848 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.319e+01 2.612e+01 2.885e+01 4.256e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-18 08:40:42,211 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 08:40:51,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7800, loss[loss=0.09771, beats_loss=0.009832, ecapa_loss=0.0001645, whisper_loss=0.08623, over 18978.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.08986, over 3894598.44 frames. ], batch size: 78, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:40:52,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-18 08:41:20,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3799430.0, ans=0.0 2024-08-18 08:41:27,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-08-18 08:41:42,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3799530.0, ans=0.125 2024-08-18 08:41:50,568 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 08:41:58,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7850, loss[loss=0.1005, beats_loss=0.008637, ecapa_loss=0.0001476, whisper_loss=0.09035, over 15979.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001447, whisper_loss=0.08997, over 3904171.27 frames. ], batch size: 60, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:42:02,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3799730.0, ans=0.5 2024-08-18 08:42:28,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3799930.0, ans=0.05 2024-08-18 08:42:36,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3799930.0, ans=0.125 2024-08-18 08:42:42,480 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.955e+01 2.336e+01 2.580e+01 3.031e+01 8.251e+01, threshold=5.160e+01, percent-clipped=2.0 2024-08-18 08:42:46,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3800030.0, ans=0.2 2024-08-18 08:42:55,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3800130.0, ans=0.125 2024-08-18 08:42:55,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3800130.0, ans=0.125 2024-08-18 08:42:58,986 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 08:43:05,398 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7900, loss[loss=0.1024, beats_loss=0.009735, ecapa_loss=0.0001321, whisper_loss=0.09136, over 19743.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001451, whisper_loss=0.09001, over 3865027.50 frames. ], batch size: 76, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:43:06,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3800230.0, ans=0.125 2024-08-18 08:43:21,293 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-18 08:43:24,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3800330.0, ans=0.07 2024-08-18 08:43:29,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2024-08-18 08:43:31,784 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 08:43:36,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3800430.0, ans=0.1 2024-08-18 08:43:44,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-08-18 08:43:53,683 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 08:43:56,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-08-18 08:44:04,252 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 08:44:05,803 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 08:44:06,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3800630.0, ans=0.125 2024-08-18 08:44:07,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3800630.0, ans=0.125 2024-08-18 08:44:10,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3800730.0, ans=0.125 2024-08-18 08:44:11,682 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 7950, loss[loss=0.1102, beats_loss=0.01031, ecapa_loss=0.0001299, whisper_loss=0.09858, over 22002.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001451, whisper_loss=0.0903, over 3866978.34 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:44:11,796 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 08:44:16,844 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 08:44:25,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3800830.0, ans=0.2 2024-08-18 08:44:29,997 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 08:44:41,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3800930.0, ans=0.125 2024-08-18 08:44:56,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.349e+01 2.585e+01 3.062e+01 4.002e+02, threshold=5.169e+01, percent-clipped=3.0 2024-08-18 08:45:10,155 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 08:45:20,252 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 08:45:21,679 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8000, loss[loss=0.1153, beats_loss=0.01099, ecapa_loss=0.0001388, whisper_loss=0.1029, over 22993.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001447, whisper_loss=0.09031, over 3838420.43 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:45:35,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3801330.0, ans=0.125 2024-08-18 08:45:43,727 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 08:46:22,179 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 08:46:30,577 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8050, loss[loss=0.09449, beats_loss=0.01155, ecapa_loss=0.0001539, whisper_loss=0.08141, over 22373.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001457, whisper_loss=0.09015, over 3804442.24 frames. ], batch size: 93, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:46:33,330 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 08:46:54,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3801830.0, ans=0.07 2024-08-18 08:47:07,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3801930.0, ans=0.125 2024-08-18 08:47:07,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.18 vs. limit=6.0 2024-08-18 08:47:14,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3802030.0, ans=0.0 2024-08-18 08:47:14,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.216e+01 2.436e+01 2.750e+01 5.017e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-18 08:47:38,712 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8100, loss[loss=0.1346, beats_loss=0.008121, ecapa_loss=0.0001549, whisper_loss=0.125, over 23629.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001452, whisper_loss=0.09073, over 3842022.81 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:47:51,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3802330.0, ans=0.0 2024-08-18 08:48:21,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3802530.0, ans=0.125 2024-08-18 08:48:49,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8150, loss[loss=0.1078, beats_loss=0.01026, ecapa_loss=0.0001332, whisper_loss=0.09626, over 16957.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001448, whisper_loss=0.09078, over 3858492.08 frames. ], batch size: 68, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:48:54,217 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 08:48:54,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3802730.0, ans=0.1 2024-08-18 08:48:54,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2024-08-18 08:48:57,081 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 08:48:59,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3802730.0, ans=0.07 2024-08-18 08:49:03,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3802830.0, ans=0.125 2024-08-18 08:49:06,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3802830.0, ans=0.0 2024-08-18 08:49:27,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3802930.0, ans=0.2 2024-08-18 08:49:30,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3802930.0, ans=0.0 2024-08-18 08:49:35,784 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.360e+01 2.577e+01 2.939e+01 1.297e+02, threshold=5.154e+01, percent-clipped=2.0 2024-08-18 08:49:39,352 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 08:50:01,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8200, loss[loss=0.1293, beats_loss=0.009886, ecapa_loss=0.0001501, whisper_loss=0.1179, over 22128.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001455, whisper_loss=0.09092, over 3881398.08 frames. ], batch size: 86, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:50:16,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=3803330.0, ans=0.02 2024-08-18 08:50:21,295 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 08:50:25,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3803330.0, ans=0.2 2024-08-18 08:50:30,078 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 08:50:39,970 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 08:50:41,944 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 24 from Vox, 45 fro AS 2024-08-18 08:50:45,203 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 08:50:46,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.65 vs. limit=10.0 2024-08-18 08:50:52,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3803530.0, ans=0.125 2024-08-18 08:50:53,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3803530.0, ans=0.1 2024-08-18 08:51:05,267 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 08:51:12,503 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-18 08:51:15,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8250, loss[loss=0.08758, beats_loss=0.009871, ecapa_loss=0.0001748, whisper_loss=0.07596, over 16110.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.000145, whisper_loss=0.09075, over 3883922.92 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:51:41,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3803830.0, ans=0.0 2024-08-18 08:51:58,910 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 08:52:02,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.328e+01 2.473e+01 2.837e+01 4.238e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-18 08:52:06,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=3804030.0, ans=0.125 2024-08-18 08:52:29,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8300, loss[loss=0.1078, beats_loss=0.01008, ecapa_loss=0.0001063, whisper_loss=0.09663, over 17449.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.000144, whisper_loss=0.09035, over 3889899.83 frames. ], batch size: 65, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:52:32,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-18 08:52:36,791 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 08:52:39,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3804230.0, ans=0.125 2024-08-18 08:52:56,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3804330.0, ans=0.125 2024-08-18 08:53:05,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3804430.0, ans=0.125 2024-08-18 08:53:18,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3804530.0, ans=0.5 2024-08-18 08:53:26,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3804530.0, ans=0.125 2024-08-18 08:53:35,271 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.616e+01 2024-08-18 08:53:48,467 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8350, loss[loss=0.08056, beats_loss=0.01141, ecapa_loss=0.0001632, whisper_loss=0.06752, over 20798.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001441, whisper_loss=0.0901, over 3898251.92 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:54:01,581 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 08:54:09,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3804830.0, ans=0.125 2024-08-18 08:54:09,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3804830.0, ans=0.2 2024-08-18 08:54:23,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3804930.0, ans=0.0 2024-08-18 08:54:27,227 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-18 08:54:34,888 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.322e+01 2.536e+01 2.820e+01 4.636e+01, threshold=5.072e+01, percent-clipped=0.0 2024-08-18 08:54:38,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3805030.0, ans=0.0 2024-08-18 08:54:41,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.85 vs. limit=15.0 2024-08-18 08:54:42,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3805030.0, ans=0.1 2024-08-18 08:54:47,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3805130.0, ans=0.5 2024-08-18 08:54:58,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3805130.0, ans=0.125 2024-08-18 08:54:58,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3805130.0, ans=0.1 2024-08-18 08:55:03,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8400, loss[loss=0.1027, beats_loss=0.01075, ecapa_loss=0.0001494, whisper_loss=0.09047, over 21850.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.0001456, whisper_loss=0.09106, over 3921008.84 frames. ], batch size: 86, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:55:32,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3805330.0, ans=10.0 2024-08-18 08:55:52,347 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 08:56:11,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3805630.0, ans=0.125 2024-08-18 08:56:23,597 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8450, loss[loss=0.09496, beats_loss=0.009458, ecapa_loss=0.0001459, whisper_loss=0.08405, over 16355.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01034, ecapa_loss=0.000146, whisper_loss=0.09132, over 3933232.45 frames. ], batch size: 64, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:56:27,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-18 08:56:39,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3805830.0, ans=0.1 2024-08-18 08:57:28,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3805930.0, ans=0.0 2024-08-18 08:57:45,546 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.360e+01 2.600e+01 3.017e+01 9.268e+01, threshold=5.200e+01, percent-clipped=1.0 2024-08-18 08:58:13,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3806130.0, ans=0.07 2024-08-18 08:58:16,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8500, loss[loss=0.08752, beats_loss=0.01353, ecapa_loss=0.0001446, whisper_loss=0.07255, over 20209.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001468, whisper_loss=0.09069, over 3935701.15 frames. ], batch size: 84, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:58:17,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3806230.0, ans=0.125 2024-08-18 08:58:47,946 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 08:58:50,182 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 08:58:53,270 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 08:59:11,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3806530.0, ans=0.1 2024-08-18 08:59:24,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3806630.0, ans=0.1 2024-08-18 08:59:37,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8550, loss[loss=0.09992, beats_loss=0.008942, ecapa_loss=0.0001142, whisper_loss=0.08984, over 19695.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001455, whisper_loss=0.09, over 3893709.58 frames. ], batch size: 74, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 08:59:53,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3806830.0, ans=0.125 2024-08-18 09:00:27,965 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.270e+01 2.542e+01 2.954e+01 6.029e+01, threshold=5.084e+01, percent-clipped=1.0 2024-08-18 09:00:47,820 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 09:00:55,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8600, loss[loss=0.1134, beats_loss=0.009673, ecapa_loss=0.0001655, whisper_loss=0.1021, over 14456.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01045, ecapa_loss=0.0001455, whisper_loss=0.09021, over 3881180.59 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:01:33,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3807430.0, ans=10.0 2024-08-18 09:01:35,204 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-18 09:01:37,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3807430.0, ans=0.1 2024-08-18 09:01:39,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3807530.0, ans=0.125 2024-08-18 09:02:04,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3807630.0, ans=0.1 2024-08-18 09:02:09,282 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8650, loss[loss=0.0992, beats_loss=0.01027, ecapa_loss=0.0001588, whisper_loss=0.08734, over 22275.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.000146, whisper_loss=0.09008, over 3915488.01 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:02:12,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3807730.0, ans=0.0 2024-08-18 09:02:13,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.18 vs. limit=10.0 2024-08-18 09:02:20,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3807730.0, ans=0.0 2024-08-18 09:02:21,765 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-18 09:02:31,833 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 09:02:34,159 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 09:02:35,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3807830.0, ans=0.125 2024-08-18 09:02:54,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3808030.0, ans=0.125 2024-08-18 09:02:56,377 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.256e+01 2.496e+01 2.847e+01 1.710e+02, threshold=4.992e+01, percent-clipped=2.0 2024-08-18 09:03:15,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3808130.0, ans=0.1 2024-08-18 09:03:20,051 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 09:03:22,267 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8700, loss[loss=0.08424, beats_loss=0.01021, ecapa_loss=0.0001462, whisper_loss=0.07256, over 14248.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001463, whisper_loss=0.0906, over 3929973.42 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:03:30,995 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 09:03:32,440 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 09:04:09,059 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 32 from Vox, 30 fro AS 2024-08-18 09:04:20,730 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-18 09:04:32,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8750, loss[loss=0.09633, beats_loss=0.007422, ecapa_loss=0.000146, whisper_loss=0.08745, over 18250.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001465, whisper_loss=0.09058, over 3915698.14 frames. ], batch size: 70, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:04:36,237 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 09:04:43,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.20 vs. limit=10.0 2024-08-18 09:04:45,680 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 09:05:00,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3808930.0, ans=0.125 2024-08-18 09:05:04,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2024-08-18 09:05:15,044 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.311e+01 2.590e+01 2.877e+01 4.936e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 09:05:26,737 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 09:05:32,340 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 09:05:35,249 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2024-08-18 09:05:38,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8800, loss[loss=0.08932, beats_loss=0.01232, ecapa_loss=0.0001488, whisper_loss=0.07551, over 21992.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001472, whisper_loss=0.09093, over 3905151.48 frames. ], batch size: 93, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:05:40,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3809230.0, ans=0.125 2024-08-18 09:05:54,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3809330.0, ans=0.0 2024-08-18 09:06:04,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3809430.0, ans=0.2 2024-08-18 09:06:10,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3809430.0, ans=0.0 2024-08-18 09:06:15,724 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.405e+05 2024-08-18 09:06:19,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3809530.0, ans=0.125 2024-08-18 09:06:26,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3809530.0, ans=0.125 2024-08-18 09:06:42,311 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8850, loss[loss=0.0972, beats_loss=0.01268, ecapa_loss=0.0001145, whisper_loss=0.08338, over 14399.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001455, whisper_loss=0.09027, over 3904378.93 frames. ], batch size: 56, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:06:49,957 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 09:06:53,883 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 09:06:58,005 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 09:07:08,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3809930.0, ans=0.1 2024-08-18 09:07:23,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.147e+01 2.440e+01 2.882e+01 4.203e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-18 09:07:41,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3810130.0, ans=0.0 2024-08-18 09:07:41,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-08-18 09:07:47,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8900, loss[loss=0.1111, beats_loss=0.01018, ecapa_loss=0.000163, whisper_loss=0.09931, over 21083.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001458, whisper_loss=0.08971, over 3912222.19 frames. ], batch size: 85, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:07:47,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3810230.0, ans=0.0 2024-08-18 09:07:53,469 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 09:08:04,922 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 09:08:10,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2024-08-18 09:08:23,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=12.0 2024-08-18 09:08:26,009 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 09:08:39,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2024-08-18 09:08:53,011 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 8950, loss[loss=0.09387, beats_loss=0.008169, ecapa_loss=0.0001562, whisper_loss=0.08413, over 22156.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001456, whisper_loss=0.09007, over 3892524.80 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:09:09,623 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 09:09:11,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3810830.0, ans=0.2 2024-08-18 09:09:19,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3810930.0, ans=0.125 2024-08-18 09:09:35,385 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.997e+01 2.305e+01 2.570e+01 2.937e+01 4.370e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-18 09:09:45,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3811130.0, ans=0.125 2024-08-18 09:09:59,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9000, loss[loss=0.1011, beats_loss=0.01183, ecapa_loss=0.0001388, whisper_loss=0.08786, over 15995.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01053, ecapa_loss=0.0001456, whisper_loss=0.09025, over 3879780.90 frames. ], batch size: 65, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:09:59,358 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 09:10:40,565 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005276, whisper_loss=0.2478, over 922467.00 frames. 2024-08-18 09:10:56,566 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on SV_voxceleb1: loss=0.004116, beats_loss=0, ecapa_loss=0.0004116, whisper_loss=0, over 939242.00 frames. 2024-08-18 09:12:49,861 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on AT_audioset: loss=0.02315, beats_loss=0.02315, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 09:12:49,865 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 09:12:54,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3811230.0, ans=0.0 2024-08-18 09:13:14,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3811330.0, ans=0.0 2024-08-18 09:13:40,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3811530.0, ans=0.125 2024-08-18 09:13:42,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3811630.0, ans=0.0 2024-08-18 09:13:49,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3811630.0, ans=0.0 2024-08-18 09:13:49,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-08-18 09:13:50,118 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 21 from LS+wenet, 31 from Vox, 43 fro AS 2024-08-18 09:13:56,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9050, loss[loss=0.09138, beats_loss=0.01044, ecapa_loss=0.0001546, whisper_loss=0.0794, over 18507.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001443, whisper_loss=0.08992, over 3882940.24 frames. ], batch size: 77, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:14:08,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3811830.0, ans=0.0 2024-08-18 09:14:08,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3811830.0, ans=0.125 2024-08-18 09:14:37,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.306e+01 2.516e+01 2.800e+01 4.042e+01, threshold=5.033e+01, percent-clipped=0.0 2024-08-18 09:14:38,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3812030.0, ans=0.125 2024-08-18 09:14:41,216 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 09:14:45,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3812030.0, ans=0.1 2024-08-18 09:14:47,914 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 09:14:52,331 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-08-18 09:14:55,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.11 vs. limit=12.0 2024-08-18 09:15:02,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9100, loss[loss=0.1047, beats_loss=0.01264, ecapa_loss=0.0001278, whisper_loss=0.09073, over 22669.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.09024, over 3872663.70 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:15:12,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3812230.0, ans=0.125 2024-08-18 09:15:14,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-18 09:15:17,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3812330.0, ans=0.04949747468305833 2024-08-18 09:15:20,279 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 09:15:21,557 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 09:15:34,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.17 vs. limit=15.0 2024-08-18 09:15:50,116 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 10 from Vox, 33 fro AS 2024-08-18 09:15:52,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3812630.0, ans=0.125 2024-08-18 09:15:57,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3812630.0, ans=0.125 2024-08-18 09:15:58,913 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 09:16:06,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9150, loss[loss=0.08824, beats_loss=0.01035, ecapa_loss=0.0001517, whisper_loss=0.07637, over 22566.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.000145, whisper_loss=0.08972, over 3872992.95 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:16:09,678 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 09:16:12,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3812730.0, ans=0.125 2024-08-18 09:16:14,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3812730.0, ans=0.125 2024-08-18 09:16:33,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3812930.0, ans=0.125 2024-08-18 09:16:35,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-08-18 09:16:36,091 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 09:16:47,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.602e+01 2.322e+01 2.560e+01 2.828e+01 4.789e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-18 09:16:50,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=3813030.0, ans=0.2 2024-08-18 09:16:57,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3813130.0, ans=0.1 2024-08-18 09:17:07,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3813130.0, ans=0.125 2024-08-18 09:17:10,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9200, loss[loss=0.08486, beats_loss=0.01182, ecapa_loss=0.0001695, whisper_loss=0.07134, over 21609.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001448, whisper_loss=0.08972, over 3903653.69 frames. ], batch size: 94, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:17:19,102 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 09:17:40,288 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.000e+05 2024-08-18 09:17:41,399 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 09:17:42,286 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 09:17:43,520 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 09:17:47,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3813430.0, ans=0.1 2024-08-18 09:17:50,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=3813530.0, ans=0.02 2024-08-18 09:18:05,976 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-18 09:18:08,726 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 09:18:14,895 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9250, loss[loss=0.1159, beats_loss=0.008499, ecapa_loss=0.0001534, whisper_loss=0.1059, over 20308.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.000145, whisper_loss=0.08947, over 3890801.28 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:18:23,726 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 09:18:25,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3813730.0, ans=0.125 2024-08-18 09:18:26,286 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 09:18:32,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3813830.0, ans=0.125 2024-08-18 09:18:39,109 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 09:18:54,706 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 09:18:55,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.301e+01 2.520e+01 2.839e+01 9.703e+01, threshold=5.041e+01, percent-clipped=1.0 2024-08-18 09:19:04,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3814130.0, ans=0.125 2024-08-18 09:19:11,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2024-08-18 09:19:17,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3814230.0, ans=0.2 2024-08-18 09:19:18,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9300, loss[loss=0.09765, beats_loss=0.01068, ecapa_loss=0.0001409, whisper_loss=0.08557, over 20889.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.09028, over 3912045.49 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:19:25,100 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.70 vs. limit=12.0 2024-08-18 09:19:30,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3814330.0, ans=0.125 2024-08-18 09:19:37,498 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 09:19:38,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3814330.0, ans=0.1 2024-08-18 09:19:43,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3814430.0, ans=0.07 2024-08-18 09:19:45,700 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 09:19:46,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3814430.0, ans=10.0 2024-08-18 09:19:52,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3814430.0, ans=0.125 2024-08-18 09:20:03,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3814530.0, ans=0.125 2024-08-18 09:20:09,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3814630.0, ans=0.0 2024-08-18 09:20:13,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3814630.0, ans=0.125 2024-08-18 09:20:20,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9350, loss[loss=0.09781, beats_loss=0.0121, ecapa_loss=0.0001006, whisper_loss=0.08471, over 21294.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001438, whisper_loss=0.08992, over 3882139.72 frames. ], batch size: 81, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:20:22,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3814730.0, ans=0.2 2024-08-18 09:20:32,350 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 09:20:36,033 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 09:20:38,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3814830.0, ans=0.2 2024-08-18 09:20:48,015 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 09:20:54,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3814930.0, ans=0.125 2024-08-18 09:21:00,569 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.210e+01 2.465e+01 2.743e+01 3.638e+02, threshold=4.930e+01, percent-clipped=1.0 2024-08-18 09:21:02,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.56 vs. limit=15.0 2024-08-18 09:21:15,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3815130.0, ans=0.125 2024-08-18 09:21:21,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3815130.0, ans=0.0 2024-08-18 09:21:23,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9400, loss[loss=0.06497, beats_loss=0.01348, ecapa_loss=0.0001283, whisper_loss=0.05022, over 13390.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001443, whisper_loss=0.08981, over 3860166.65 frames. ], batch size: 55, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:21:25,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3815230.0, ans=0.125 2024-08-18 09:21:38,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3815330.0, ans=0.0 2024-08-18 09:21:39,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3815330.0, ans=0.125 2024-08-18 09:21:40,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3815330.0, ans=0.125 2024-08-18 09:21:42,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3815330.0, ans=0.1 2024-08-18 09:21:49,061 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-18 09:21:52,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3815430.0, ans=10.0 2024-08-18 09:21:54,765 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-18 09:22:01,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2024-08-18 09:22:26,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9450, loss[loss=0.1214, beats_loss=0.008199, ecapa_loss=0.000156, whisper_loss=0.1116, over 22102.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001447, whisper_loss=0.08973, over 3878571.77 frames. ], batch size: 86, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:23:05,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.291e+01 2.521e+01 2.767e+01 4.094e+02, threshold=5.042e+01, percent-clipped=1.0 2024-08-18 09:23:20,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3816130.0, ans=0.125 2024-08-18 09:23:27,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9500, loss[loss=0.12, beats_loss=0.008698, ecapa_loss=0.0001819, whisper_loss=0.1095, over 19190.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01063, ecapa_loss=0.0001451, whisper_loss=0.08893, over 3865962.48 frames. ], batch size: 75, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:23:30,045 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 09:23:32,417 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 09:23:52,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3816430.0, ans=0.125 2024-08-18 09:24:05,488 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 09:24:27,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3816730.0, ans=0.04949747468305833 2024-08-18 09:24:28,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9550, loss[loss=0.1088, beats_loss=0.009705, ecapa_loss=0.0001556, whisper_loss=0.09757, over 22950.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001458, whisper_loss=0.0899, over 3902102.82 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:24:33,637 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 09:25:08,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.361e+01 2.629e+01 2.923e+01 5.081e+01, threshold=5.257e+01, percent-clipped=1.0 2024-08-18 09:25:08,297 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 09:25:15,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3817030.0, ans=0.125 2024-08-18 09:25:23,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3817130.0, ans=0.07 2024-08-18 09:25:26,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3817130.0, ans=0.0 2024-08-18 09:25:30,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9600, loss[loss=0.07903, beats_loss=0.009416, ecapa_loss=0.000161, whisper_loss=0.068, over 14732.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001465, whisper_loss=0.08949, over 3856079.61 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:25:40,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3817230.0, ans=0.0 2024-08-18 09:25:45,465 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 09:26:07,730 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 09:26:15,218 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 09:26:15,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3817530.0, ans=0.0 2024-08-18 09:26:21,597 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 09:26:32,470 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9650, loss[loss=0.1148, beats_loss=0.009668, ecapa_loss=0.0001322, whisper_loss=0.1038, over 20803.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001474, whisper_loss=0.09023, over 3845480.26 frames. ], batch size: 80, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:26:34,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3817730.0, ans=0.1 2024-08-18 09:26:40,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3817730.0, ans=0.5 2024-08-18 09:27:01,575 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.608e+00 2024-08-18 09:27:11,839 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.349e+01 2.605e+01 2.994e+01 4.918e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-18 09:27:12,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3818030.0, ans=6.0 2024-08-18 09:27:27,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3818130.0, ans=0.125 2024-08-18 09:27:33,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9700, loss[loss=0.09881, beats_loss=0.01061, ecapa_loss=0.0001345, whisper_loss=0.08685, over 22185.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01037, ecapa_loss=0.0001472, whisper_loss=0.09059, over 3833088.61 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:27:58,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3818430.0, ans=0.125 2024-08-18 09:27:59,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3818430.0, ans=0.125 2024-08-18 09:28:03,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-18 09:28:36,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9750, loss[loss=0.1196, beats_loss=0.009427, ecapa_loss=0.000129, whisper_loss=0.1088, over 23449.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01044, ecapa_loss=0.0001464, whisper_loss=0.09018, over 3849947.91 frames. ], batch size: 91, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:28:39,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=3818730.0, ans=22.5 2024-08-18 09:28:41,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-18 09:29:01,270 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 09:29:01,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3818930.0, ans=0.0 2024-08-18 09:29:05,006 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 09:29:05,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.35 vs. limit=15.0 2024-08-18 09:29:06,106 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 09:29:15,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.255e+01 2.464e+01 2.832e+01 2.481e+02, threshold=4.927e+01, percent-clipped=2.0 2024-08-18 09:29:15,902 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 12 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-18 09:29:20,844 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 09:29:24,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3819130.0, ans=0.2 2024-08-18 09:29:27,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3819130.0, ans=0.1 2024-08-18 09:29:30,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3819130.0, ans=0.0 2024-08-18 09:29:32,865 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 09:29:37,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9800, loss[loss=0.09087, beats_loss=0.01231, ecapa_loss=0.000136, whisper_loss=0.0772, over 14395.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001457, whisper_loss=0.08993, over 3825909.29 frames. ], batch size: 54, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:29:52,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3819330.0, ans=0.1 2024-08-18 09:29:57,533 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-18 09:30:03,690 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 09:30:07,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3819430.0, ans=0.2 2024-08-18 09:30:33,839 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 09:30:36,438 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 14 from LS+wenet, 34 from Vox, 30 fro AS 2024-08-18 09:30:38,656 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9850, loss[loss=0.09418, beats_loss=0.01267, ecapa_loss=0.000129, whisper_loss=0.08021, over 21315.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001446, whisper_loss=0.09006, over 3842651.85 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 1.152921504606847e+18 2024-08-18 09:30:40,029 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 09:30:45,960 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 09:31:07,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-08-18 09:31:07,850 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 09:31:15,949 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.44 vs. limit=15.0 2024-08-18 09:31:18,665 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.410e+01 2.708e+01 2.991e+01 3.936e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-18 09:31:23,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3820030.0, ans=0.125 2024-08-18 09:31:26,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3820130.0, ans=0.125 2024-08-18 09:31:31,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-08-18 09:31:39,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9900, loss[loss=0.1016, beats_loss=0.01079, ecapa_loss=0.0001366, whisper_loss=0.08943, over 18700.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01059, ecapa_loss=0.000145, whisper_loss=0.09095, over 3878132.78 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:31:42,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3820230.0, ans=0.125 2024-08-18 09:31:44,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3820230.0, ans=0.025 2024-08-18 09:31:51,351 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 09:31:59,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-18 09:32:21,376 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 09:32:22,508 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 09:32:25,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3820530.0, ans=0.125 2024-08-18 09:32:31,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3820630.0, ans=0.1 2024-08-18 09:32:35,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3820630.0, ans=0.125 2024-08-18 09:32:38,888 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 09:32:39,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3820630.0, ans=0.07 2024-08-18 09:32:42,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 9950, loss[loss=0.05985, beats_loss=0.01128, ecapa_loss=0.0001461, whisper_loss=0.04711, over 12997.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001453, whisper_loss=0.09018, over 3844868.11 frames. ], batch size: 53, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:32:53,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3820830.0, ans=0.05 2024-08-18 09:33:05,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-18 09:33:14,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3820930.0, ans=0.125 2024-08-18 09:33:22,697 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.257e+01 2.517e+01 2.867e+01 4.376e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-18 09:33:33,687 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=15.0 2024-08-18 09:33:34,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3821130.0, ans=0.0 2024-08-18 09:33:37,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3821130.0, ans=0.125 2024-08-18 09:33:38,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2024-08-18 09:33:39,263 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 09:33:43,921 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10000, loss[loss=0.1177, beats_loss=0.01172, ecapa_loss=0.0001955, whisper_loss=0.104, over 21725.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001456, whisper_loss=0.09051, over 3854457.87 frames. ], batch size: 92, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:33:54,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.42 vs. limit=15.0 2024-08-18 09:34:07,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-18 09:34:18,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.94 vs. limit=12.0 2024-08-18 09:34:29,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.37 vs. limit=22.5 2024-08-18 09:34:45,183 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10050, loss[loss=0.08468, beats_loss=0.01195, ecapa_loss=0.0001517, whisper_loss=0.07121, over 14805.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01052, ecapa_loss=0.0001452, whisper_loss=0.09053, over 3836611.52 frames. ], batch size: 63, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:35:02,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=12.0 2024-08-18 09:35:03,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3821830.0, ans=0.1 2024-08-18 09:35:18,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3821930.0, ans=0.125 2024-08-18 09:35:23,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-18 09:35:25,375 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.531e+01 2.231e+01 2.440e+01 2.652e+01 3.423e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-18 09:35:41,476 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 09:35:45,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3822230.0, ans=0.0 2024-08-18 09:35:45,977 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10100, loss[loss=0.09254, beats_loss=0.009637, ecapa_loss=0.0001412, whisper_loss=0.0815, over 15235.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.000145, whisper_loss=0.0907, over 3857210.86 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:35:52,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3822230.0, ans=0.025 2024-08-18 09:36:01,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3822330.0, ans=0.0 2024-08-18 09:36:08,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=3822330.0, ans=0.02 2024-08-18 09:36:17,286 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 09:36:19,867 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 09:36:32,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3822530.0, ans=0.0 2024-08-18 09:36:43,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3822630.0, ans=0.125 2024-08-18 09:36:47,543 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10150, loss[loss=0.09265, beats_loss=0.009044, ecapa_loss=0.0001762, whisper_loss=0.08184, over 21699.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.000145, whisper_loss=0.09067, over 3880115.44 frames. ], batch size: 88, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:36:55,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-18 09:37:04,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=3822830.0, ans=0.04949747468305833 2024-08-18 09:37:28,037 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.258e+01 2.544e+01 2.982e+01 1.019e+02, threshold=5.088e+01, percent-clipped=1.0 2024-08-18 09:37:37,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3823130.0, ans=0.125 2024-08-18 09:37:38,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3823130.0, ans=0.125 2024-08-18 09:37:42,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=15.0 2024-08-18 09:37:42,916 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 09:37:48,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10200, loss[loss=0.09111, beats_loss=0.0117, ecapa_loss=0.0001236, whisper_loss=0.07817, over 19115.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001449, whisper_loss=0.09031, over 3878099.43 frames. ], batch size: 73, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:37:49,068 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-18 09:37:53,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3823230.0, ans=0.0 2024-08-18 09:38:00,347 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 09:38:03,094 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 09:38:08,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3823330.0, ans=10.0 2024-08-18 09:38:19,316 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 27 from Vox, 23 fro AS 2024-08-18 09:38:39,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3823630.0, ans=0.125 2024-08-18 09:38:51,510 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10250, loss[loss=0.09561, beats_loss=0.008982, ecapa_loss=0.0001652, whisper_loss=0.08498, over 13748.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001454, whisper_loss=0.08955, over 3892460.21 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:38:55,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3823730.0, ans=0.0 2024-08-18 09:39:34,723 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.298e+01 2.473e+01 2.719e+01 4.293e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 09:39:42,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3824130.0, ans=0.2 2024-08-18 09:39:46,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3824130.0, ans=0.2 2024-08-18 09:39:49,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3824130.0, ans=0.0 2024-08-18 09:39:56,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10300, loss[loss=0.101, beats_loss=0.00995, ecapa_loss=0.0001477, whisper_loss=0.08958, over 16816.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01058, ecapa_loss=0.0001456, whisper_loss=0.09005, over 3899584.69 frames. ], batch size: 66, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:40:14,229 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 09:40:27,841 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 09:40:34,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3824530.0, ans=0.125 2024-08-18 09:40:41,676 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 09:40:47,156 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 09:40:49,839 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 09:41:00,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3824730.0, ans=0.125 2024-08-18 09:41:01,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10350, loss[loss=0.08454, beats_loss=0.01396, ecapa_loss=0.0001097, whisper_loss=0.06949, over 14064.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01062, ecapa_loss=0.0001457, whisper_loss=0.09, over 3911621.42 frames. ], batch size: 57, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:41:02,477 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 09:41:03,662 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 09:41:14,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3824830.0, ans=0.125 2024-08-18 09:41:18,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=3824830.0, ans=0.2 2024-08-18 09:41:30,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3824930.0, ans=0.05 2024-08-18 09:41:42,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.342e+01 2.610e+01 2.810e+01 3.800e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-18 09:42:04,100 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10400, loss[loss=0.1074, beats_loss=0.009425, ecapa_loss=0.0001225, whisper_loss=0.09676, over 16226.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.000145, whisper_loss=0.08996, over 3877393.47 frames. ], batch size: 61, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:42:08,252 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 09:42:15,381 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 09:42:33,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3825430.0, ans=0.1 2024-08-18 09:42:33,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3825430.0, ans=0.125 2024-08-18 09:42:34,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3825430.0, ans=0.125 2024-08-18 09:42:34,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3825430.0, ans=0.2 2024-08-18 09:42:41,910 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 09:42:44,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-18 09:43:07,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10450, loss[loss=0.1166, beats_loss=0.009385, ecapa_loss=0.0001182, whisper_loss=0.106, over 23228.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.09002, over 3864106.39 frames. ], batch size: 89, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:43:09,983 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 13 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 09:43:14,944 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 09:43:27,984 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 09:43:37,770 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 09:43:42,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3825930.0, ans=0.0 2024-08-18 09:43:49,661 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.276e+01 2.479e+01 2.678e+01 3.882e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-18 09:43:54,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3826030.0, ans=0.09899494936611666 2024-08-18 09:43:55,173 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 09:44:07,875 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 09:44:11,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10500, loss[loss=0.101, beats_loss=0.01256, ecapa_loss=0.0001503, whisper_loss=0.08698, over 21236.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.0898, over 3814423.89 frames. ], batch size: 87, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:44:38,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.81 vs. limit=22.5 2024-08-18 09:44:41,957 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 09:44:44,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3826430.0, ans=0.0 2024-08-18 09:44:47,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3826430.0, ans=0.0 2024-08-18 09:44:51,605 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 09:45:07,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3826630.0, ans=0.0 2024-08-18 09:45:10,776 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 09:45:16,718 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 09:45:20,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10550, loss[loss=0.07923, beats_loss=0.01112, ecapa_loss=0.0001343, whisper_loss=0.06677, over 20844.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001466, whisper_loss=0.08948, over 3823708.12 frames. ], batch size: 84, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:45:27,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3826730.0, ans=0.1 2024-08-18 09:45:28,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3826730.0, ans=0.2 2024-08-18 09:45:30,571 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 09:45:35,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3826830.0, ans=0.0 2024-08-18 09:45:46,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3826830.0, ans=0.125 2024-08-18 09:45:50,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3826930.0, ans=0.0 2024-08-18 09:45:53,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3826930.0, ans=0.125 2024-08-18 09:45:53,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.86 vs. limit=15.0 2024-08-18 09:46:07,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.737e+01 2.328e+01 2.590e+01 2.836e+01 3.998e+01, threshold=5.181e+01, percent-clipped=0.0 2024-08-18 09:46:16,248 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 19 from LS+wenet, 15 from Vox, 54 fro AS 2024-08-18 09:46:20,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3827130.0, ans=0.125 2024-08-18 09:46:27,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-08-18 09:46:30,838 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10600, loss[loss=0.1293, beats_loss=0.009584, ecapa_loss=0.0001441, whisper_loss=0.1183, over 18113.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001467, whisper_loss=0.08979, over 3854244.68 frames. ], batch size: 71, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:46:32,536 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 09:46:40,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.16 vs. limit=10.0 2024-08-18 09:46:45,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-18 09:47:03,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-08-18 09:47:20,218 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 13 from Vox, 43 fro AS 2024-08-18 09:47:26,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3827630.0, ans=0.125 2024-08-18 09:47:30,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3827630.0, ans=0.125 2024-08-18 09:47:31,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3827630.0, ans=0.125 2024-08-18 09:47:40,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-08-18 09:47:40,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10650, loss[loss=0.08557, beats_loss=0.01225, ecapa_loss=0.0001417, whisper_loss=0.07191, over 21623.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001448, whisper_loss=0.09026, over 3842279.83 frames. ], batch size: 90, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:47:52,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3827730.0, ans=0.125 2024-08-18 09:47:53,147 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 09:47:59,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3827830.0, ans=0.95 2024-08-18 09:48:25,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.286e+01 2.479e+01 2.836e+01 4.249e+01, threshold=4.958e+01, percent-clipped=0.0 2024-08-18 09:48:28,936 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 09:48:33,425 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 09:48:48,987 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10700, loss[loss=0.102, beats_loss=0.009371, ecapa_loss=0.0001356, whisper_loss=0.09128, over 14986.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001436, whisper_loss=0.0904, over 3811695.44 frames. ], batch size: 59, lr: 2.35e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:49:22,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-08-18 09:49:30,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3828530.0, ans=0.0 2024-08-18 09:49:36,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-08-18 09:49:42,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3828530.0, ans=0.125 2024-08-18 09:49:52,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3828630.0, ans=0.2 2024-08-18 09:49:58,724 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10750, loss[loss=0.09679, beats_loss=0.009081, ecapa_loss=0.0001299, whisper_loss=0.08641, over 18268.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001447, whisper_loss=0.0898, over 3833895.09 frames. ], batch size: 72, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:50:28,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3828930.0, ans=0.1 2024-08-18 09:50:35,165 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 09:50:43,103 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.343e+01 2.580e+01 2.880e+01 1.020e+02, threshold=5.160e+01, percent-clipped=2.0 2024-08-18 09:51:01,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3829130.0, ans=0.125 2024-08-18 09:51:02,118 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=15.0 2024-08-18 09:51:02,734 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 28 from Vox, 29 fro AS 2024-08-18 09:51:05,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10800, loss[loss=0.1093, beats_loss=0.008458, ecapa_loss=0.00016, whisper_loss=0.09927, over 22743.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001447, whisper_loss=0.0906, over 3841616.31 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:51:26,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-18 09:51:29,361 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 09:51:30,620 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 09:51:32,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3829430.0, ans=0.125 2024-08-18 09:51:37,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3829430.0, ans=0.125 2024-08-18 09:51:45,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3829530.0, ans=0.2 2024-08-18 09:51:45,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.48 vs. limit=22.5 2024-08-18 09:51:49,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3829530.0, ans=0.125 2024-08-18 09:51:51,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3829530.0, ans=0.05 2024-08-18 09:51:51,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3829530.0, ans=0.125 2024-08-18 09:51:55,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3829630.0, ans=0.0 2024-08-18 09:52:08,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10850, loss[loss=0.06895, beats_loss=0.0114, ecapa_loss=0.0001446, whisper_loss=0.05611, over 14096.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01061, ecapa_loss=0.0001444, whisper_loss=0.09034, over 3854020.91 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:52:24,676 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 09:52:24,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3829830.0, ans=0.125 2024-08-18 09:52:31,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3829830.0, ans=0.125 2024-08-18 09:52:36,768 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 09:52:42,205 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 09:52:43,333 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 09:52:48,056 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 09:52:49,080 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.278e+01 2.571e+01 2.919e+01 2.090e+02, threshold=5.141e+01, percent-clipped=1.0 2024-08-18 09:52:49,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3830030.0, ans=0.0 2024-08-18 09:53:10,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10900, loss[loss=0.1137, beats_loss=0.008927, ecapa_loss=0.0001535, whisper_loss=0.1032, over 20203.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.0001458, whisper_loss=0.09093, over 3879945.69 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:53:25,306 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 09:53:32,529 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-18 09:53:38,341 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.16 vs. limit=15.0 2024-08-18 09:53:42,734 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 09:53:47,528 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 09:53:48,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.06 vs. limit=22.5 2024-08-18 09:53:54,969 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 09:54:03,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3830630.0, ans=0.1 2024-08-18 09:54:12,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 10950, loss[loss=0.1177, beats_loss=0.009201, ecapa_loss=0.0001548, whisper_loss=0.107, over 22729.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.000145, whisper_loss=0.09095, over 3862466.62 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:54:21,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3830730.0, ans=0.2 2024-08-18 09:54:37,523 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 09:54:42,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3830930.0, ans=0.0 2024-08-18 09:54:53,576 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.311e+01 2.560e+01 2.813e+01 5.362e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 09:54:56,013 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 09:55:07,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3831130.0, ans=0.0 2024-08-18 09:55:09,518 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 09:55:14,495 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11000, loss[loss=0.1236, beats_loss=0.007954, ecapa_loss=0.0001754, whisper_loss=0.1139, over 21601.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.09067, over 3871959.51 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:55:20,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3831230.0, ans=0.0 2024-08-18 09:55:22,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3831230.0, ans=0.125 2024-08-18 09:55:30,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3831330.0, ans=0.0 2024-08-18 09:55:37,611 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 09:55:45,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3831430.0, ans=0.125 2024-08-18 09:55:48,909 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 09:55:57,723 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.95 vs. limit=15.0 2024-08-18 09:56:02,037 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-18 09:56:03,430 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 09:56:06,069 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 09:56:11,104 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 09:56:15,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11050, loss[loss=0.09831, beats_loss=0.01277, ecapa_loss=0.0001459, whisper_loss=0.08408, over 21575.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001447, whisper_loss=0.09074, over 3857322.94 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:56:17,005 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-18 09:56:22,112 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 09:56:24,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3831730.0, ans=0.125 2024-08-18 09:56:33,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.46 vs. limit=22.5 2024-08-18 09:56:34,591 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 09:56:36,948 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 09:56:51,022 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 09:56:51,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3831930.0, ans=0.2 2024-08-18 09:56:56,742 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.812e+01 2.378e+01 2.551e+01 2.783e+01 4.873e+01, threshold=5.103e+01, percent-clipped=0.0 2024-08-18 09:57:02,058 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-18 09:57:06,077 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 09:57:13,102 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 09:57:14,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3832130.0, ans=0.125 2024-08-18 09:57:17,710 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11100, loss[loss=0.08814, beats_loss=0.007645, ecapa_loss=0.0001944, whisper_loss=0.07855, over 13558.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01057, ecapa_loss=0.0001438, whisper_loss=0.09018, over 3864749.45 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:57:25,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3832230.0, ans=0.0 2024-08-18 09:57:38,873 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-18 09:57:39,688 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.90 vs. limit=15.0 2024-08-18 09:57:40,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3832330.0, ans=0.5 2024-08-18 09:57:49,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3832430.0, ans=0.1 2024-08-18 09:57:49,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-18 09:57:54,470 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2024-08-18 09:58:08,668 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 09:58:11,161 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 09:58:19,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11150, loss[loss=0.1056, beats_loss=0.01044, ecapa_loss=0.0001387, whisper_loss=0.09374, over 22541.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001448, whisper_loss=0.09082, over 3906407.13 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:58:20,091 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 09:58:21,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-18 09:58:24,980 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 09:58:31,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3832830.0, ans=0.125 2024-08-18 09:58:32,163 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 36 from Vox, 29 fro AS 2024-08-18 09:58:32,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3832830.0, ans=0.035 2024-08-18 09:58:33,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3832830.0, ans=0.0 2024-08-18 09:58:35,629 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 09:58:46,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=12.0 2024-08-18 09:58:53,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3832930.0, ans=0.125 2024-08-18 09:58:58,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3833030.0, ans=0.125 2024-08-18 09:59:00,324 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.313e+01 2.531e+01 2.915e+01 1.663e+02, threshold=5.062e+01, percent-clipped=1.0 2024-08-18 09:59:01,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3833030.0, ans=0.1 2024-08-18 09:59:03,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3833030.0, ans=10.0 2024-08-18 09:59:21,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11200, loss[loss=0.08065, beats_loss=0.01286, ecapa_loss=0.0001471, whisper_loss=0.06632, over 20464.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001441, whisper_loss=0.09081, over 3914686.30 frames. ], batch size: 87, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 09:59:30,162 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 09:59:34,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-08-18 09:59:42,622 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 09:59:46,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.04 vs. limit=12.0 2024-08-18 09:59:53,824 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 09:59:53,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3833430.0, ans=0.125 2024-08-18 10:00:07,802 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.94 vs. limit=22.5 2024-08-18 10:00:08,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3833530.0, ans=0.125 2024-08-18 10:00:12,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3833630.0, ans=0.0 2024-08-18 10:00:18,222 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 31 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 10:00:23,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11250, loss[loss=0.118, beats_loss=0.0116, ecapa_loss=0.0001356, whisper_loss=0.105, over 23310.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001442, whisper_loss=0.09152, over 3945439.86 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:00:39,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3833830.0, ans=0.2 2024-08-18 10:00:51,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-08-18 10:01:04,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.304e+01 2.562e+01 2.921e+01 1.559e+02, threshold=5.123e+01, percent-clipped=1.0 2024-08-18 10:01:06,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3834030.0, ans=0.2 2024-08-18 10:01:17,291 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 26 from Vox, 24 fro AS 2024-08-18 10:01:18,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3834130.0, ans=0.0 2024-08-18 10:01:23,721 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.640e-03 2024-08-18 10:01:25,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11300, loss[loss=0.09679, beats_loss=0.01014, ecapa_loss=0.0001312, whisper_loss=0.08534, over 20020.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01044, ecapa_loss=0.0001442, whisper_loss=0.09153, over 3931628.56 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:01:37,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=3834330.0, ans=0.5 2024-08-18 10:01:43,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3834330.0, ans=0.125 2024-08-18 10:01:45,558 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 10:02:02,105 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 10:02:02,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3834530.0, ans=0.1 2024-08-18 10:02:28,505 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11350, loss[loss=0.1141, beats_loss=0.008616, ecapa_loss=0.0001548, whisper_loss=0.1039, over 20679.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001445, whisper_loss=0.0913, over 3911944.85 frames. ], batch size: 82, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:02:35,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3834730.0, ans=0.125 2024-08-18 10:02:37,575 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2024-08-18 10:02:44,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3834830.0, ans=0.125 2024-08-18 10:02:51,210 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 10:02:52,394 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 10:02:58,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3834930.0, ans=0.0 2024-08-18 10:03:08,771 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 10:03:09,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3835030.0, ans=0.0 2024-08-18 10:03:09,867 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.308e+01 2.551e+01 2.797e+01 2.772e+02, threshold=5.101e+01, percent-clipped=2.0 2024-08-18 10:03:13,704 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 10:03:30,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11400, loss[loss=0.09961, beats_loss=0.01209, ecapa_loss=9.874e-05, whisper_loss=0.08653, over 16224.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01042, ecapa_loss=0.0001441, whisper_loss=0.09215, over 3891232.47 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:03:43,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3835330.0, ans=0.0 2024-08-18 10:03:47,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3835330.0, ans=0.125 2024-08-18 10:03:49,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835330.0, ans=0.1 2024-08-18 10:03:57,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-08-18 10:03:58,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3835430.0, ans=0.1 2024-08-18 10:04:13,530 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-18 10:04:15,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835530.0, ans=0.1 2024-08-18 10:04:22,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.74 vs. limit=15.0 2024-08-18 10:04:23,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3835630.0, ans=0.1 2024-08-18 10:04:32,502 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11450, loss[loss=0.1032, beats_loss=0.008771, ecapa_loss=0.0001642, whisper_loss=0.09275, over 15136.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01049, ecapa_loss=0.0001442, whisper_loss=0.09146, over 3885391.51 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:04:40,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3835730.0, ans=0.125 2024-08-18 10:04:43,971 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-18 10:05:13,162 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.297e+01 2.481e+01 2.767e+01 4.172e+01, threshold=4.962e+01, percent-clipped=0.0 2024-08-18 10:05:17,192 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 10:05:18,526 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 10:05:20,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.75 vs. limit=22.5 2024-08-18 10:05:22,482 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.900e+01 2024-08-18 10:05:33,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11500, loss[loss=0.1134, beats_loss=0.009129, ecapa_loss=0.000142, whisper_loss=0.1029, over 22564.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001437, whisper_loss=0.09042, over 3886035.90 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:05:37,356 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 10:06:05,396 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:06:26,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3836630.0, ans=0.1 2024-08-18 10:06:31,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3836630.0, ans=0.125 2024-08-18 10:06:33,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-18 10:06:39,935 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11550, loss[loss=0.1154, beats_loss=0.01134, ecapa_loss=0.000139, whisper_loss=0.1027, over 19249.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001445, whisper_loss=0.09081, over 3883179.71 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:06:52,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3836830.0, ans=0.125 2024-08-18 10:06:53,042 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 10:07:03,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3836830.0, ans=0.125 2024-08-18 10:07:11,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3836930.0, ans=0.0 2024-08-18 10:07:17,041 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 10:07:20,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3837030.0, ans=0.2 2024-08-18 10:07:26,132 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.337e+01 2.538e+01 2.781e+01 3.732e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-18 10:07:37,392 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 10:07:38,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3837130.0, ans=10.0 2024-08-18 10:07:44,581 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 10:07:52,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11600, loss[loss=0.1197, beats_loss=0.009581, ecapa_loss=0.0001608, whisper_loss=0.1085, over 21570.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001449, whisper_loss=0.09069, over 3896799.92 frames. ], batch size: 85, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:07:52,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3837230.0, ans=0.125 2024-08-18 10:07:55,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3837230.0, ans=0.125 2024-08-18 10:08:07,136 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-08-18 10:08:13,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=15.0 2024-08-18 10:08:21,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.57 vs. limit=15.0 2024-08-18 10:08:22,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3837430.0, ans=0.1 2024-08-18 10:08:26,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-08-18 10:08:27,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3837430.0, ans=0.125 2024-08-18 10:08:44,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2024-08-18 10:08:46,956 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 10:08:58,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.80 vs. limit=10.0 2024-08-18 10:09:06,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11650, loss[loss=0.1035, beats_loss=0.008295, ecapa_loss=0.0001522, whisper_loss=0.09365, over 16666.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01048, ecapa_loss=0.0001441, whisper_loss=0.09018, over 3892769.61 frames. ], batch size: 66, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:09:07,430 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 10:09:08,764 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 10:09:10,475 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 10:09:12,108 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 10:09:22,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3837830.0, ans=0.125 2024-08-18 10:09:42,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3837930.0, ans=0.125 2024-08-18 10:09:47,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.73 vs. limit=22.5 2024-08-18 10:09:48,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3837930.0, ans=0.1 2024-08-18 10:09:56,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.353e+01 2.626e+01 3.025e+01 7.544e+01, threshold=5.251e+01, percent-clipped=1.0 2024-08-18 10:10:01,376 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 10:10:01,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3838030.0, ans=0.125 2024-08-18 10:10:12,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-18 10:10:16,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=3838130.0, ans=0.02 2024-08-18 10:10:21,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3838230.0, ans=0.125 2024-08-18 10:10:21,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11700, loss[loss=0.1075, beats_loss=0.007642, ecapa_loss=0.0001482, whisper_loss=0.09841, over 14301.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001436, whisper_loss=0.0898, over 3901099.67 frames. ], batch size: 54, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:10:35,742 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 10:10:42,497 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3838330.0, ans=0.125 2024-08-18 10:10:47,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3838330.0, ans=0.125 2024-08-18 10:10:51,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3838430.0, ans=0.125 2024-08-18 10:11:07,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3838530.0, ans=0.0 2024-08-18 10:11:10,421 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 10:11:16,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.79 vs. limit=10.0 2024-08-18 10:11:18,013 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 10:11:35,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11750, loss[loss=0.0997, beats_loss=0.01168, ecapa_loss=0.0001687, whisper_loss=0.08633, over 21673.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.000144, whisper_loss=0.09003, over 3914853.17 frames. ], batch size: 94, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:11:37,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3838730.0, ans=0.125 2024-08-18 10:11:44,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-08-18 10:11:50,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3838830.0, ans=0.0 2024-08-18 10:11:51,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3838830.0, ans=0.125 2024-08-18 10:11:53,033 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 10:11:53,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3838830.0, ans=0.07 2024-08-18 10:12:01,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3838830.0, ans=0.125 2024-08-18 10:12:24,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.258e+01 2.471e+01 2.735e+01 3.598e+01, threshold=4.941e+01, percent-clipped=0.0 2024-08-18 10:12:39,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.32 vs. limit=15.0 2024-08-18 10:12:47,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2024-08-18 10:12:51,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11800, loss[loss=0.09658, beats_loss=0.01155, ecapa_loss=0.0001268, whisper_loss=0.08376, over 14074.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01064, ecapa_loss=0.0001445, whisper_loss=0.09012, over 3906309.92 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:13:03,055 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 10:13:17,140 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 10:13:18,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3839430.0, ans=0.0 2024-08-18 10:13:27,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2024-08-18 10:13:31,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3839430.0, ans=0.125 2024-08-18 10:13:39,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3839530.0, ans=0.1 2024-08-18 10:13:55,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3839630.0, ans=0.95 2024-08-18 10:14:00,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3839630.0, ans=0.125 2024-08-18 10:14:03,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3839730.0, ans=0.125 2024-08-18 10:14:04,418 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11850, loss[loss=0.08337, beats_loss=0.01322, ecapa_loss=0.0001296, whisper_loss=0.06885, over 14179.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01059, ecapa_loss=0.0001455, whisper_loss=0.09047, over 3901751.20 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:14:10,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3839730.0, ans=0.0 2024-08-18 10:14:22,269 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 10:14:24,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3839830.0, ans=0.125 2024-08-18 10:14:25,818 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 10:14:26,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3839830.0, ans=0.125 2024-08-18 10:14:40,275 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 34 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 10:14:47,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3839930.0, ans=0.1 2024-08-18 10:14:56,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.290e+01 2.585e+01 2.979e+01 4.833e+01, threshold=5.171e+01, percent-clipped=0.0 2024-08-18 10:14:56,885 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:14:56,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3840030.0, ans=0.1 2024-08-18 10:14:58,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3840030.0, ans=0.025 2024-08-18 10:15:06,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=12.0 2024-08-18 10:15:19,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11900, loss[loss=0.1047, beats_loss=0.01202, ecapa_loss=0.0001197, whisper_loss=0.09149, over 23321.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001452, whisper_loss=0.09055, over 3945668.56 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:15:27,150 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-18 10:15:33,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3840330.0, ans=0.025 2024-08-18 10:15:45,602 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:16:21,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-18 10:16:29,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3840730.0, ans=0.0 2024-08-18 10:16:30,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 11950, loss[loss=0.08744, beats_loss=0.00935, ecapa_loss=0.0002222, whisper_loss=0.07587, over 14185.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001454, whisper_loss=0.0908, over 3928282.08 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:16:33,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-18 10:17:11,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3840930.0, ans=0.125 2024-08-18 10:17:20,861 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.256e+01 2.516e+01 2.795e+01 5.453e+01, threshold=5.033e+01, percent-clipped=1.0 2024-08-18 10:17:45,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12000, loss[loss=0.09183, beats_loss=0.01035, ecapa_loss=0.0001256, whisper_loss=0.08022, over 16442.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001459, whisper_loss=0.09007, over 3907516.99 frames. ], batch size: 63, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:17:45,303 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 10:18:21,620 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005311, whisper_loss=0.2478, over 922467.00 frames. 2024-08-18 10:18:36,737 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.2738, 2.0851, 2.6018, 2.6507], device='cuda:3') 2024-08-18 10:18:40,319 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on SV_voxceleb1: loss=0.004077, beats_loss=0, ecapa_loss=0.0004077, whisper_loss=0, over 939242.00 frames. 2024-08-18 10:20:17,351 INFO [train_multi_KD3.py:1149] (3/4) Epoch 26, validation on AT_audioset: loss=0.02316, beats_loss=0.02316, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 10:20:17,355 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 10:20:23,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3841230.0, ans=0.0 2024-08-18 10:20:29,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3841230.0, ans=0.125 2024-08-18 10:20:29,882 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 10:20:44,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3841430.0, ans=0.125 2024-08-18 10:20:46,842 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-18 10:20:51,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3841430.0, ans=0.1 2024-08-18 10:20:53,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3841430.0, ans=0.125 2024-08-18 10:21:30,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12050, loss[loss=0.1023, beats_loss=0.01127, ecapa_loss=0.0001097, whisper_loss=0.08994, over 16287.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001453, whisper_loss=0.09014, over 3910596.69 frames. ], batch size: 60, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:21:32,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=15.0 2024-08-18 10:21:41,284 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 10:21:47,418 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 10:22:12,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=3841930.0, ans=0.025 2024-08-18 10:22:12,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3841930.0, ans=0.2 2024-08-18 10:22:21,978 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 10:22:24,309 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.275e+01 2.552e+01 2.886e+01 4.482e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-18 10:22:48,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12100, loss[loss=0.09695, beats_loss=0.01109, ecapa_loss=0.0001624, whisper_loss=0.08423, over 19094.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001457, whisper_loss=0.0902, over 3869494.94 frames. ], batch size: 79, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:22:49,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-18 10:22:58,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=15.0 2024-08-18 10:23:06,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3842330.0, ans=0.0 2024-08-18 10:23:10,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3842330.0, ans=0.0 2024-08-18 10:23:13,659 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 10:23:28,052 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 10:23:29,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3842430.0, ans=0.1 2024-08-18 10:23:29,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3842430.0, ans=0.125 2024-08-18 10:23:33,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3842530.0, ans=0.0 2024-08-18 10:23:55,902 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.628e+00 2024-08-18 10:24:00,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=15.0 2024-08-18 10:24:04,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12150, loss[loss=0.09976, beats_loss=0.01263, ecapa_loss=0.0001308, whisper_loss=0.08582, over 22995.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.000147, whisper_loss=0.09015, over 3838924.26 frames. ], batch size: 93, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:24:15,270 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 10:24:21,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3842830.0, ans=22.5 2024-08-18 10:24:28,069 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.44 vs. limit=10.0 2024-08-18 10:24:29,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3842830.0, ans=0.1 2024-08-18 10:24:34,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3842930.0, ans=0.04949747468305833 2024-08-18 10:24:34,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.24 vs. limit=15.0 2024-08-18 10:24:37,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2024-08-18 10:24:44,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3842930.0, ans=10.0 2024-08-18 10:24:51,900 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 10:24:54,706 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.341e+01 2.544e+01 2.867e+01 3.722e+01, threshold=5.088e+01, percent-clipped=0.0 2024-08-18 10:25:15,118 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-18 10:25:19,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12200, loss[loss=0.1024, beats_loss=0.01123, ecapa_loss=0.0001207, whisper_loss=0.08992, over 16434.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001462, whisper_loss=0.09036, over 3812609.97 frames. ], batch size: 64, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:25:25,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3843230.0, ans=0.125 2024-08-18 10:25:54,345 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.47 vs. limit=22.5 2024-08-18 10:26:20,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2024-08-18 10:26:31,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=10.0 2024-08-18 10:26:42,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12250, loss[loss=0.1224, beats_loss=0.009855, ecapa_loss=0.0001594, whisper_loss=0.1109, over 23566.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001464, whisper_loss=0.09079, over 3836983.29 frames. ], batch size: 95, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:27:01,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-18 10:27:04,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3843830.0, ans=0.1 2024-08-18 10:27:09,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3843830.0, ans=0.125 2024-08-18 10:27:19,621 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 10:27:21,663 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 14 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 10:27:22,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3843930.0, ans=0.125 2024-08-18 10:27:23,992 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=12.0 2024-08-18 10:27:26,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3843930.0, ans=0.0 2024-08-18 10:27:36,178 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.791e+01 2.268e+01 2.517e+01 2.839e+01 6.711e+01, threshold=5.035e+01, percent-clipped=1.0 2024-08-18 10:27:37,973 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 10:27:46,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3844130.0, ans=0.0 2024-08-18 10:27:47,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=3844130.0, ans=0.09899494936611666 2024-08-18 10:27:58,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3844130.0, ans=0.125 2024-08-18 10:28:01,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12300, loss[loss=0.1106, beats_loss=0.00965, ecapa_loss=0.0001767, whisper_loss=0.09915, over 21612.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001459, whisper_loss=0.09087, over 3862241.83 frames. ], batch size: 89, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:28:02,375 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 10:28:10,595 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 10:28:15,588 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 10:28:23,858 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 10:28:39,513 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 9 from Vox, 43 fro AS 2024-08-18 10:28:39,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3844430.0, ans=0.125 2024-08-18 10:28:42,841 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 10:28:48,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3844430.0, ans=0.025 2024-08-18 10:28:56,871 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 10:29:02,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3844530.0, ans=0.1 2024-08-18 10:29:07,928 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 10:29:20,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3844630.0, ans=0.0 2024-08-18 10:29:21,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3844630.0, ans=0.125 2024-08-18 10:29:23,534 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12350, loss[loss=0.1342, beats_loss=0.008647, ecapa_loss=0.0001449, whisper_loss=0.1241, over 15500.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001459, whisper_loss=0.09078, over 3864790.82 frames. ], batch size: 61, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:29:30,780 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 10:29:31,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3844730.0, ans=0.125 2024-08-18 10:29:50,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3844830.0, ans=0.125 2024-08-18 10:30:00,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2024-08-18 10:30:07,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2024-08-18 10:30:14,994 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 10:30:16,845 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 10:30:19,974 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.330e+01 2.553e+01 2.743e+01 3.939e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-18 10:30:29,504 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 10:30:42,971 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.80 vs. limit=12.0 2024-08-18 10:30:46,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12400, loss[loss=0.1195, beats_loss=0.007952, ecapa_loss=0.0001696, whisper_loss=0.1099, over 19913.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001465, whisper_loss=0.09023, over 3884640.98 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:30:50,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3845230.0, ans=0.125 2024-08-18 10:31:28,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3845430.0, ans=0.2 2024-08-18 10:31:35,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3845530.0, ans=0.1 2024-08-18 10:31:42,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3845530.0, ans=0.5 2024-08-18 10:31:59,571 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-18 10:32:02,582 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 10:32:07,464 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12450, loss[loss=0.1065, beats_loss=0.00903, ecapa_loss=0.0001764, whisper_loss=0.09566, over 14867.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0105, ecapa_loss=0.0001461, whisper_loss=0.08966, over 3867686.95 frames. ], batch size: 59, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:32:11,879 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 10:32:14,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3845730.0, ans=0.1 2024-08-18 10:32:26,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3845830.0, ans=0.125 2024-08-18 10:32:38,239 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=15.0 2024-08-18 10:32:42,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=3845930.0, ans=15.0 2024-08-18 10:32:52,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3845930.0, ans=0.125 2024-08-18 10:33:04,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.360e+01 2.669e+01 3.014e+01 4.345e+01, threshold=5.338e+01, percent-clipped=0.0 2024-08-18 10:33:17,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-08-18 10:33:21,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3846130.0, ans=0.0 2024-08-18 10:33:24,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3846130.0, ans=0.1 2024-08-18 10:33:31,217 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12500, loss[loss=0.1086, beats_loss=0.01153, ecapa_loss=0.000132, whisper_loss=0.09578, over 23221.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001446, whisper_loss=0.08981, over 3891123.96 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:33:43,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3846230.0, ans=0.0 2024-08-18 10:33:43,883 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.84 vs. limit=22.5 2024-08-18 10:33:45,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2024-08-18 10:34:06,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3846430.0, ans=0.125 2024-08-18 10:34:17,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.57 vs. limit=22.5 2024-08-18 10:34:25,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3846530.0, ans=0.1 2024-08-18 10:34:29,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3846630.0, ans=0.0 2024-08-18 10:34:46,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12550, loss[loss=0.1158, beats_loss=0.009141, ecapa_loss=0.0001516, whisper_loss=0.1051, over 22955.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001443, whisper_loss=0.09005, over 3904990.50 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:34:59,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3846730.0, ans=0.125 2024-08-18 10:35:11,207 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 10:35:11,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3846830.0, ans=0.125 2024-08-18 10:35:14,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3846830.0, ans=0.07 2024-08-18 10:35:14,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3846830.0, ans=0.07 2024-08-18 10:35:15,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.97 vs. limit=15.0 2024-08-18 10:35:20,932 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 10:35:24,008 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 10:35:41,240 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.783e+01 2.372e+01 2.623e+01 3.122e+01 3.895e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-18 10:35:52,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-18 10:36:05,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12600, loss[loss=0.09021, beats_loss=0.01216, ecapa_loss=0.0001345, whisper_loss=0.0767, over 14157.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001457, whisper_loss=0.09069, over 3911060.83 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:36:09,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3847230.0, ans=0.0 2024-08-18 10:36:11,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3847230.0, ans=0.125 2024-08-18 10:36:12,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-08-18 10:36:14,506 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 10:36:24,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3847330.0, ans=0.2 2024-08-18 10:36:35,487 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 10:36:35,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3847330.0, ans=0.1 2024-08-18 10:36:37,565 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 10:36:44,587 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 10:37:12,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3847630.0, ans=0.125 2024-08-18 10:37:15,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=3847630.0, ans=0.5 2024-08-18 10:37:18,523 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 8 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 10:37:27,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12650, loss[loss=0.1115, beats_loss=0.008392, ecapa_loss=0.0001429, whisper_loss=0.1016, over 23697.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001451, whisper_loss=0.09021, over 3914843.15 frames. ], batch size: 93, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:37:58,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3847930.0, ans=0.125 2024-08-18 10:38:13,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3848030.0, ans=0.1 2024-08-18 10:38:15,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3848030.0, ans=0.125 2024-08-18 10:38:16,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3848030.0, ans=0.125 2024-08-18 10:38:18,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3848030.0, ans=0.125 2024-08-18 10:38:20,864 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.252e+01 2.537e+01 2.887e+01 4.368e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 10:38:38,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3848130.0, ans=0.0 2024-08-18 10:38:47,362 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:38:48,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12700, loss[loss=0.0963, beats_loss=0.01056, ecapa_loss=0.0001364, whisper_loss=0.08438, over 17794.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01069, ecapa_loss=0.0001452, whisper_loss=0.08944, over 3933413.53 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:38:58,161 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 10:39:11,582 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 10:39:37,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.37 vs. limit=10.0 2024-08-18 10:39:42,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3848530.0, ans=0.0 2024-08-18 10:39:52,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3848630.0, ans=0.0 2024-08-18 10:39:53,221 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-18 10:39:53,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-18 10:39:55,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.49 vs. limit=15.0 2024-08-18 10:40:09,659 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12750, loss[loss=0.1066, beats_loss=0.0119, ecapa_loss=0.0001305, whisper_loss=0.09343, over 14283.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.000145, whisper_loss=0.08991, over 3926297.11 frames. ], batch size: 55, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:40:13,763 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 10:40:18,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3848730.0, ans=0.0 2024-08-18 10:40:35,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3848830.0, ans=0.2 2024-08-18 10:40:37,279 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.79 vs. limit=6.0 2024-08-18 10:41:04,851 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.297e+01 2.524e+01 2.809e+01 4.213e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 10:41:06,822 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 10:41:29,562 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12800, loss[loss=0.09922, beats_loss=0.01117, ecapa_loss=0.0001542, whisper_loss=0.0865, over 21437.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01068, ecapa_loss=0.0001457, whisper_loss=0.08985, over 3975809.64 frames. ], batch size: 90, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:41:30,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3849230.0, ans=0.125 2024-08-18 10:41:42,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2024-08-18 10:42:02,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=3849430.0, ans=15.0 2024-08-18 10:42:17,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3849430.0, ans=0.125 2024-08-18 10:42:22,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3849530.0, ans=0.125 2024-08-18 10:42:49,784 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12850, loss[loss=0.1019, beats_loss=0.01306, ecapa_loss=7.997e-05, whisper_loss=0.08808, over 15602.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01065, ecapa_loss=0.0001456, whisper_loss=0.09021, over 3943053.00 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:42:51,879 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 10:42:54,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3849730.0, ans=0.2 2024-08-18 10:43:00,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3849730.0, ans=0.1 2024-08-18 10:43:08,568 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 10:43:25,433 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 10:43:41,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.317e+01 2.573e+01 2.906e+01 6.087e+01, threshold=5.147e+01, percent-clipped=1.0 2024-08-18 10:43:42,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3850030.0, ans=0.125 2024-08-18 10:44:04,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12900, loss[loss=0.07979, beats_loss=0.01205, ecapa_loss=0.0001144, whisper_loss=0.0666, over 18843.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001447, whisper_loss=0.08967, over 3935085.65 frames. ], batch size: 77, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:44:04,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3850230.0, ans=0.125 2024-08-18 10:44:13,953 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:44:23,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.44 vs. limit=22.5 2024-08-18 10:44:33,470 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 10:44:47,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3850430.0, ans=0.2 2024-08-18 10:44:54,472 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 10:44:56,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3850530.0, ans=0.125 2024-08-18 10:45:05,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3850530.0, ans=0.125 2024-08-18 10:45:08,115 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 10:45:08,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=12.0 2024-08-18 10:45:23,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 12950, loss[loss=0.09607, beats_loss=0.008178, ecapa_loss=0.0001493, whisper_loss=0.08639, over 16210.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01054, ecapa_loss=0.0001451, whisper_loss=0.09028, over 3916291.58 frames. ], batch size: 62, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:45:25,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3850730.0, ans=0.0 2024-08-18 10:45:26,224 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 10:45:29,208 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 10:45:37,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3850830.0, ans=0.0 2024-08-18 10:46:10,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3851030.0, ans=0.125 2024-08-18 10:46:13,012 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.206e+01 2.482e+01 2.888e+01 5.100e+01, threshold=4.963e+01, percent-clipped=0.0 2024-08-18 10:46:23,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3851130.0, ans=0.0 2024-08-18 10:46:31,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=12.0 2024-08-18 10:46:36,862 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13000, loss[loss=0.1171, beats_loss=0.008431, ecapa_loss=0.0001837, whisper_loss=0.1068, over 18355.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001455, whisper_loss=0.08984, over 3899524.99 frames. ], batch size: 72, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:46:42,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=3851230.0, ans=0.0 2024-08-18 10:46:50,216 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 10:46:50,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3851230.0, ans=0.2 2024-08-18 10:46:55,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=3851330.0, ans=0.0 2024-08-18 10:47:00,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3851330.0, ans=0.125 2024-08-18 10:47:06,905 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 10:47:16,183 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 10:47:35,325 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 10:47:40,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-18 10:47:53,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13050, loss[loss=0.109, beats_loss=0.008104, ecapa_loss=0.000158, whisper_loss=0.09928, over 21478.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001452, whisper_loss=0.09003, over 3908716.11 frames. ], batch size: 85, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:48:01,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3851730.0, ans=0.05 2024-08-18 10:48:35,930 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-08-18 10:48:47,362 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 10:48:48,285 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.212e+01 2.477e+01 2.743e+01 3.903e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-18 10:48:49,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3852030.0, ans=0.1 2024-08-18 10:48:53,358 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2024-08-18 10:49:04,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3852130.0, ans=0.1 2024-08-18 10:49:15,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3852230.0, ans=0.2 2024-08-18 10:49:15,856 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13100, loss[loss=0.092, beats_loss=0.01185, ecapa_loss=0.0001573, whisper_loss=0.07857, over 17955.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.0001443, whisper_loss=0.09022, over 3910304.71 frames. ], batch size: 73, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:49:18,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3852230.0, ans=0.125 2024-08-18 10:49:25,853 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 34 from Vox, 28 fro AS 2024-08-18 10:50:03,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3852530.0, ans=0.07 2024-08-18 10:50:18,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.36 vs. limit=22.5 2024-08-18 10:50:30,864 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13150, loss[loss=0.1069, beats_loss=0.0113, ecapa_loss=0.0001373, whisper_loss=0.09425, over 21858.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001458, whisper_loss=0.08972, over 3902292.89 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:50:31,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3852730.0, ans=0.125 2024-08-18 10:50:45,576 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 33 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 10:50:56,818 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 10:51:02,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2024-08-18 10:51:19,063 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.841e+01 2.362e+01 2.528e+01 2.819e+01 1.566e+02, threshold=5.056e+01, percent-clipped=2.0 2024-08-18 10:51:42,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13200, loss[loss=0.09677, beats_loss=0.009156, ecapa_loss=0.0001548, whisper_loss=0.08606, over 18910.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001456, whisper_loss=0.08971, over 3896618.44 frames. ], batch size: 74, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:51:44,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=3853230.0, ans=0.2 2024-08-18 10:51:52,569 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 10:52:04,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3853330.0, ans=0.2 2024-08-18 10:52:24,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3853430.0, ans=0.2 2024-08-18 10:52:25,742 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:52:59,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13250, loss[loss=0.1105, beats_loss=0.01124, ecapa_loss=0.0001169, whisper_loss=0.09805, over 23675.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001448, whisper_loss=0.08968, over 3897479.39 frames. ], batch size: 92, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:53:11,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3853730.0, ans=0.125 2024-08-18 10:53:12,133 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 10:53:12,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3853830.0, ans=0.125 2024-08-18 10:53:28,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2024-08-18 10:53:46,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3854030.0, ans=0.07 2024-08-18 10:53:47,023 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.315e+01 2.671e+01 3.107e+01 9.539e+01, threshold=5.342e+01, percent-clipped=1.0 2024-08-18 10:54:03,119 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 10:54:03,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3854130.0, ans=0.1 2024-08-18 10:54:04,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3854130.0, ans=0.1 2024-08-18 10:54:09,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13300, loss[loss=0.09855, beats_loss=0.01137, ecapa_loss=0.0001134, whisper_loss=0.08604, over 16530.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001445, whisper_loss=0.08976, over 3880436.18 frames. ], batch size: 64, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:54:11,148 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-18 10:54:14,381 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 10:54:14,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2024-08-18 10:54:18,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3854230.0, ans=0.125 2024-08-18 10:55:14,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-08-18 10:55:19,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13350, loss[loss=0.104, beats_loss=0.01022, ecapa_loss=0.0001509, whisper_loss=0.09229, over 15574.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001443, whisper_loss=0.09045, over 3877598.51 frames. ], batch size: 65, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:55:34,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-18 10:55:44,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3854830.0, ans=0.125 2024-08-18 10:55:58,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3854930.0, ans=0.125 2024-08-18 10:56:05,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.282e+01 2.499e+01 2.710e+01 4.906e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 10:56:27,152 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 10:56:28,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13400, loss[loss=0.08969, beats_loss=0.01013, ecapa_loss=0.0001795, whisper_loss=0.07777, over 12991.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001445, whisper_loss=0.09019, over 3866751.87 frames. ], batch size: 53, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:56:28,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3855230.0, ans=0.0 2024-08-18 10:56:29,690 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 10:56:38,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.38 vs. limit=12.0 2024-08-18 10:56:51,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3855330.0, ans=0.2 2024-08-18 10:56:54,161 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-18 10:56:55,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3855430.0, ans=0.125 2024-08-18 10:57:06,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3855430.0, ans=0.1 2024-08-18 10:57:08,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3855530.0, ans=0.2 2024-08-18 10:57:28,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3855630.0, ans=0.2 2024-08-18 10:57:33,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3855630.0, ans=0.1 2024-08-18 10:57:34,659 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-18 10:57:36,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13450, loss[loss=0.1172, beats_loss=0.007869, ecapa_loss=0.0001598, whisper_loss=0.1077, over 20497.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001446, whisper_loss=0.09018, over 3864498.95 frames. ], batch size: 80, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:57:46,124 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 10:58:05,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3855930.0, ans=0.025 2024-08-18 10:58:20,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.386e+01 2.619e+01 2.833e+01 4.407e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-18 10:58:28,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3856130.0, ans=0.0 2024-08-18 10:58:32,117 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 10:58:41,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13500, loss[loss=0.1183, beats_loss=0.01033, ecapa_loss=0.0001199, whisper_loss=0.1068, over 15463.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.000145, whisper_loss=0.0901, over 3868427.13 frames. ], batch size: 57, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:58:44,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3856230.0, ans=0.035 2024-08-18 10:58:52,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2024-08-18 10:58:57,477 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 10:59:06,661 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 10:59:07,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.14 vs. limit=15.0 2024-08-18 10:59:09,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3856430.0, ans=0.2 2024-08-18 10:59:10,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3856430.0, ans=0.125 2024-08-18 10:59:22,502 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 23 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 10:59:30,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3856530.0, ans=0.125 2024-08-18 10:59:32,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.72 vs. limit=15.0 2024-08-18 10:59:36,268 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 10:59:39,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3856630.0, ans=0.125 2024-08-18 10:59:39,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3856630.0, ans=0.125 2024-08-18 10:59:47,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13550, loss[loss=0.0927, beats_loss=0.01105, ecapa_loss=0.0001041, whisper_loss=0.08061, over 22881.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01056, ecapa_loss=0.0001448, whisper_loss=0.08989, over 3847939.15 frames. ], batch size: 88, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 10:59:50,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3856730.0, ans=0.0 2024-08-18 11:00:11,763 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 11:00:14,656 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.33 vs. limit=10.0 2024-08-18 11:00:30,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.267e+01 2.460e+01 2.738e+01 4.250e+01, threshold=4.920e+01, percent-clipped=0.0 2024-08-18 11:00:46,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3857130.0, ans=0.2 2024-08-18 11:00:51,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13600, loss[loss=0.1037, beats_loss=0.00931, ecapa_loss=0.0001492, whisper_loss=0.09293, over 18703.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001444, whisper_loss=0.08989, over 3881833.07 frames. ], batch size: 78, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:00:57,209 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 11:01:02,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.22 vs. limit=15.0 2024-08-18 11:01:14,490 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 11:01:17,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3857430.0, ans=0.1 2024-08-18 11:01:21,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3857430.0, ans=0.1 2024-08-18 11:01:22,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3857430.0, ans=0.1 2024-08-18 11:01:27,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3857430.0, ans=0.125 2024-08-18 11:01:35,671 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 28 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 11:01:47,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3857630.0, ans=0.0 2024-08-18 11:01:57,866 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13650, loss[loss=0.1016, beats_loss=0.01073, ecapa_loss=0.0001351, whisper_loss=0.08952, over 19416.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01069, ecapa_loss=0.0001436, whisper_loss=0.08961, over 3876327.56 frames. ], batch size: 75, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:02:07,272 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-18 11:02:24,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3857930.0, ans=0.125 2024-08-18 11:02:43,482 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.717e+01 2.270e+01 2.557e+01 2.779e+01 4.099e+01, threshold=5.114e+01, percent-clipped=0.0 2024-08-18 11:03:05,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13700, loss[loss=0.08527, beats_loss=0.01377, ecapa_loss=0.0001472, whisper_loss=0.07003, over 22471.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01064, ecapa_loss=0.0001449, whisper_loss=0.0904, over 3905257.44 frames. ], batch size: 93, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:03:13,309 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 11:03:16,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3858230.0, ans=0.1 2024-08-18 11:03:25,490 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 11:03:43,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.17 vs. limit=15.0 2024-08-18 11:03:43,703 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 12 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 11:04:03,422 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 11:04:05,177 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 11:04:09,488 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=3.746e+01 2024-08-18 11:04:10,948 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=22.5 2024-08-18 11:04:15,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13750, loss[loss=0.09097, beats_loss=0.01282, ecapa_loss=0.0001524, whisper_loss=0.07663, over 21713.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001454, whisper_loss=0.09086, over 3889269.70 frames. ], batch size: 91, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:04:19,464 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 11:04:24,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3858730.0, ans=0.125 2024-08-18 11:04:27,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2024-08-18 11:04:27,906 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 11:04:36,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=3858830.0, ans=6.0 2024-08-18 11:04:37,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-08-18 11:04:40,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3858830.0, ans=0.125 2024-08-18 11:04:47,815 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.33 vs. limit=22.5 2024-08-18 11:05:01,841 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.684e+01 2.313e+01 2.596e+01 3.019e+01 1.808e+02, threshold=5.192e+01, percent-clipped=2.0 2024-08-18 11:05:08,760 WARNING [optim.py:496] (3/4) Scaling gradients by 0.014583197422325611, model_norm_threshold=51.91682815551758 2024-08-18 11:05:08,927 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.923e+06, grad_sumsq=1.923e+06, orig_rms_sq=1.000e+00 2024-08-18 11:05:13,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.19 vs. limit=15.0 2024-08-18 11:05:25,862 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 21 from Vox, 17 fro AS 2024-08-18 11:05:26,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13800, loss[loss=0.1154, beats_loss=0.006993, ecapa_loss=0.0002217, whisper_loss=0.1062, over 13857.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.000146, whisper_loss=0.09096, over 3866139.70 frames. ], batch size: 60, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:05:28,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3859230.0, ans=0.125 2024-08-18 11:05:30,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3859230.0, ans=0.0 2024-08-18 11:05:42,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3859330.0, ans=0.125 2024-08-18 11:05:44,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=12.0 2024-08-18 11:05:46,196 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 34 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 11:05:48,588 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 11:05:52,187 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 11:06:00,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3859430.0, ans=0.04949747468305833 2024-08-18 11:06:02,867 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 11:06:04,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3859430.0, ans=0.0 2024-08-18 11:06:05,769 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 11:06:09,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2024-08-18 11:06:21,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3859630.0, ans=0.125 2024-08-18 11:06:22,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3859630.0, ans=0.125 2024-08-18 11:06:24,316 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-18 11:06:39,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13850, loss[loss=0.1198, beats_loss=0.01023, ecapa_loss=0.0001047, whisper_loss=0.1085, over 15078.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01049, ecapa_loss=0.0001463, whisper_loss=0.0913, over 3868278.81 frames. ], batch size: 58, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:06:57,063 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-18 11:06:57,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3859830.0, ans=0.1 2024-08-18 11:07:09,083 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 11:07:16,158 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 11:07:32,257 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 11:07:39,490 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.397e+01 2.640e+01 3.060e+01 3.560e+03, threshold=5.281e+01, percent-clipped=3.0 2024-08-18 11:07:39,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3860030.0, ans=0.125 2024-08-18 11:07:57,520 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 11:08:10,223 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13900, loss[loss=0.09487, beats_loss=0.01202, ecapa_loss=0.0001418, whisper_loss=0.08144, over 17005.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01046, ecapa_loss=0.0001458, whisper_loss=0.09142, over 3882297.45 frames. ], batch size: 69, lr: 2.34e-03, grad_scale: 1.152921504606847e+18 2024-08-18 11:08:12,517 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 11:08:37,704 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 11:08:51,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3860430.0, ans=0.125 2024-08-18 11:08:57,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3860430.0, ans=0.2 2024-08-18 11:08:59,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3860430.0, ans=0.125 2024-08-18 11:09:03,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=12.0 2024-08-18 11:09:09,979 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 11:09:38,815 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 11:09:51,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 13950, loss[loss=0.09908, beats_loss=0.01049, ecapa_loss=0.0001255, whisper_loss=0.08733, over 18622.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01043, ecapa_loss=0.0001449, whisper_loss=0.09198, over 3931461.14 frames. ], batch size: 73, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:10:10,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3860730.0, ans=0.0 2024-08-18 11:10:24,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3860830.0, ans=0.0 2024-08-18 11:10:43,124 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 11:10:59,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3861030.0, ans=0.125 2024-08-18 11:11:08,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.328e+01 2.570e+01 2.839e+01 4.020e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-18 11:11:41,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14000, loss[loss=0.1035, beats_loss=0.009568, ecapa_loss=0.000148, whisper_loss=0.09243, over 16494.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01044, ecapa_loss=0.0001441, whisper_loss=0.09198, over 3910801.98 frames. ], batch size: 64, lr: 2.34e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:12:04,547 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2024-08-18 11:12:24,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3861430.0, ans=0.0 2024-08-18 11:13:01,648 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-18 11:13:28,664 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14050, loss[loss=0.09287, beats_loss=0.009066, ecapa_loss=0.0001907, whisper_loss=0.08189, over 15741.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09157, over 3907869.80 frames. ], batch size: 64, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:13:36,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3861730.0, ans=0.125 2024-08-18 11:13:53,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3861830.0, ans=0.0 2024-08-18 11:13:56,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3861830.0, ans=0.1 2024-08-18 11:14:03,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3861930.0, ans=0.125 2024-08-18 11:14:19,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.54 vs. limit=10.0 2024-08-18 11:14:23,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3862030.0, ans=0.125 2024-08-18 11:14:25,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.314e+01 2.539e+01 2.739e+01 6.784e+01, threshold=5.079e+01, percent-clipped=1.0 2024-08-18 11:14:31,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3862130.0, ans=0.1 2024-08-18 11:14:45,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-08-18 11:14:45,962 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14100, loss[loss=0.08424, beats_loss=0.01457, ecapa_loss=0.000109, whisper_loss=0.06858, over 15320.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001439, whisper_loss=0.09116, over 3884910.48 frames. ], batch size: 62, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:14:55,327 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2024-08-18 11:15:01,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3862330.0, ans=0.125 2024-08-18 11:15:03,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.56 vs. limit=22.5 2024-08-18 11:15:10,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3862330.0, ans=0.0 2024-08-18 11:15:17,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3862430.0, ans=0.1 2024-08-18 11:15:25,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3862430.0, ans=0.1 2024-08-18 11:15:56,421 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14150, loss[loss=0.1117, beats_loss=0.008946, ecapa_loss=0.000131, whisper_loss=0.1015, over 19669.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01056, ecapa_loss=0.0001446, whisper_loss=0.09127, over 3907621.90 frames. ], batch size: 77, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:16:04,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3862730.0, ans=0.5 2024-08-18 11:16:08,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3862730.0, ans=0.0 2024-08-18 11:16:08,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3862730.0, ans=0.125 2024-08-18 11:16:16,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3862830.0, ans=0.0 2024-08-18 11:16:22,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3862830.0, ans=0.0 2024-08-18 11:16:30,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3862930.0, ans=0.2 2024-08-18 11:16:47,828 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.320e+01 2.571e+01 2.843e+01 2.344e+02, threshold=5.141e+01, percent-clipped=2.0 2024-08-18 11:17:05,584 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 11:17:05,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3863130.0, ans=0.125 2024-08-18 11:17:11,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14200, loss[loss=0.09983, beats_loss=0.01022, ecapa_loss=0.0001582, whisper_loss=0.08803, over 22597.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.09097, over 3892984.02 frames. ], batch size: 93, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:17:12,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3863230.0, ans=0.125 2024-08-18 11:17:15,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3863230.0, ans=0.1 2024-08-18 11:17:23,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.52 vs. limit=10.0 2024-08-18 11:17:36,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=15.0 2024-08-18 11:17:53,279 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.317e+01 2024-08-18 11:17:58,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3863530.0, ans=0.125 2024-08-18 11:18:02,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3863530.0, ans=10.0 2024-08-18 11:18:27,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14250, loss[loss=0.1081, beats_loss=0.009816, ecapa_loss=0.0001599, whisper_loss=0.09667, over 21795.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001438, whisper_loss=0.09114, over 3902969.67 frames. ], batch size: 89, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:18:58,722 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 11:18:59,945 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 11:19:03,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3863930.0, ans=0.125 2024-08-18 11:19:09,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3863930.0, ans=0.125 2024-08-18 11:19:22,790 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 11:19:23,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.369e+01 2.559e+01 2.841e+01 3.701e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-18 11:19:24,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3864030.0, ans=0.0 2024-08-18 11:19:30,061 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 17 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-18 11:19:34,504 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 11:19:45,666 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14300, loss[loss=0.1145, beats_loss=0.009795, ecapa_loss=0.0001426, whisper_loss=0.1032, over 23329.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.000144, whisper_loss=0.09124, over 3892814.45 frames. ], batch size: 92, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:19:47,481 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 11:19:56,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3864230.0, ans=0.1 2024-08-18 11:20:02,741 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 11:20:04,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3864330.0, ans=0.125 2024-08-18 11:20:11,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=3864330.0, ans=0.1 2024-08-18 11:20:29,652 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 11:20:41,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3864530.0, ans=0.2 2024-08-18 11:20:43,319 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 11:21:04,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14350, loss[loss=0.1005, beats_loss=0.01057, ecapa_loss=0.0001032, whisper_loss=0.08887, over 17585.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.09067, over 3888742.66 frames. ], batch size: 63, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:21:05,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.66 vs. limit=15.0 2024-08-18 11:21:06,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3864730.0, ans=0.125 2024-08-18 11:21:16,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3864730.0, ans=0.125 2024-08-18 11:21:16,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-18 11:21:30,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3864830.0, ans=0.0 2024-08-18 11:21:56,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=3865030.0, ans=0.05 2024-08-18 11:21:57,155 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.333e+01 2.599e+01 2.804e+01 6.490e+01, threshold=5.198e+01, percent-clipped=1.0 2024-08-18 11:21:57,296 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 11:21:58,466 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 11:22:03,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=12.0 2024-08-18 11:22:03,472 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 11:22:06,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-18 11:22:10,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3865130.0, ans=0.2 2024-08-18 11:22:11,103 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 11:22:15,673 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.91 vs. limit=22.5 2024-08-18 11:22:18,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.48 vs. limit=15.0 2024-08-18 11:22:18,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14400, loss[loss=0.09128, beats_loss=0.009831, ecapa_loss=0.0001853, whisper_loss=0.07959, over 15311.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.000144, whisper_loss=0.091, over 3920567.78 frames. ], batch size: 64, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:22:26,492 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-18 11:22:28,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=3865230.0, ans=0.125 2024-08-18 11:22:37,873 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 18 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 11:22:39,299 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 11:22:52,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3865430.0, ans=0.0 2024-08-18 11:23:31,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 26, batch 14450, loss[loss=0.09384, beats_loss=0.0114, ecapa_loss=0.0001437, whisper_loss=0.081, over 16511.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01058, ecapa_loss=0.0001451, whisper_loss=0.09028, over 3910604.65 frames. ], batch size: 66, lr: 2.33e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:23:31,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=3865730.0, ans=0.025 2024-08-18 11:23:41,735 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 11:24:03,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3865930.0, ans=22.5 2024-08-18 11:24:04,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2024-08-18 11:24:08,178 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=5.239e+00 2024-08-18 11:24:11,774 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 11:24:19,451 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.265e+01 2.482e+01 2.875e+01 2.050e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-18 11:25:14,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 0, loss[loss=0.09873, beats_loss=0.01094, ecapa_loss=9.543e-05, whisper_loss=0.08683, over 15293.00 frames. ], tot_loss[loss=0.09873, beats_loss=0.01094, ecapa_loss=9.543e-05, whisper_loss=0.08683, over 15293.00 frames. ], batch size: 58, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:25:14,277 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 11:25:51,154 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005188, whisper_loss=0.2485, over 922467.00 frames. 2024-08-18 11:26:05,986 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on SV_voxceleb1: loss=0.004147, beats_loss=0, ecapa_loss=0.0004147, whisper_loss=0, over 939242.00 frames. 2024-08-18 11:27:48,110 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 11:27:48,113 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 11:27:55,210 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 11:27:59,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3866190.0, ans=0.125 2024-08-18 11:28:19,004 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.54 vs. limit=22.5 2024-08-18 11:28:23,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3866290.0, ans=0.125 2024-08-18 11:28:24,033 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 11:28:31,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3866290.0, ans=10.0 2024-08-18 11:28:33,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3866290.0, ans=0.0 2024-08-18 11:28:41,813 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 11:29:01,894 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2024-08-18 11:29:03,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3866490.0, ans=0.0 2024-08-18 11:29:25,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3866590.0, ans=0.125 2024-08-18 11:29:30,997 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 11:29:46,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 50, loss[loss=0.07693, beats_loss=0.01132, ecapa_loss=0.0001479, whisper_loss=0.06413, over 18167.00 frames. ], tot_loss[loss=0.09961, beats_loss=0.009386, ecapa_loss=0.0001479, whisper_loss=0.08875, over 871173.35 frames. ], batch size: 79, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:30:36,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3866890.0, ans=0.0 2024-08-18 11:30:51,154 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 11:31:11,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3866990.0, ans=0.1 2024-08-18 11:31:11,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3866990.0, ans=0.2 2024-08-18 11:31:11,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.004e+01 2.562e+01 2.806e+01 3.203e+01 5.774e+01, threshold=5.612e+01, percent-clipped=2.0 2024-08-18 11:31:17,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3867090.0, ans=10.0 2024-08-18 11:31:30,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3867090.0, ans=0.125 2024-08-18 11:31:35,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 100, loss[loss=0.09746, beats_loss=0.008251, ecapa_loss=0.0001715, whisper_loss=0.08749, over 20535.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.00945, ecapa_loss=0.0001463, whisper_loss=0.08935, over 1521468.55 frames. ], batch size: 84, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:32:17,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3867390.0, ans=0.125 2024-08-18 11:32:45,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3867490.0, ans=0.0 2024-08-18 11:33:16,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 150, loss[loss=0.07725, beats_loss=0.009317, ecapa_loss=0.0001618, whisper_loss=0.06632, over 15486.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009184, ecapa_loss=0.0001478, whisper_loss=0.09175, over 2007308.43 frames. ], batch size: 63, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:33:20,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3867690.0, ans=0.035 2024-08-18 11:33:27,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3867690.0, ans=0.2 2024-08-18 11:33:39,017 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 11:33:52,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3867890.0, ans=0.1 2024-08-18 11:33:53,622 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 11:33:57,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3867890.0, ans=0.1 2024-08-18 11:34:05,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.19 vs. limit=6.0 2024-08-18 11:34:07,835 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2024-08-18 11:34:16,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.470e+01 2.705e+01 3.027e+01 2.809e+02, threshold=5.410e+01, percent-clipped=1.0 2024-08-18 11:34:18,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3868090.0, ans=0.125 2024-08-18 11:34:18,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3868090.0, ans=0.125 2024-08-18 11:34:33,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 200, loss[loss=0.1134, beats_loss=0.007511, ecapa_loss=0.0001564, whisper_loss=0.1043, over 17102.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009496, ecapa_loss=0.0001493, whisper_loss=0.09141, over 2377993.22 frames. ], batch size: 65, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:34:45,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3868190.0, ans=0.0 2024-08-18 11:34:47,430 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.501e-02 2024-08-18 11:35:03,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3868390.0, ans=0.125 2024-08-18 11:35:09,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2024-08-18 11:35:12,750 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2024-08-18 11:35:24,217 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 11 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-18 11:35:26,998 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 11:35:41,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-18 11:35:43,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 250, loss[loss=0.1104, beats_loss=0.01029, ecapa_loss=0.0001488, whisper_loss=0.09858, over 23281.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.009804, ecapa_loss=0.0001484, whisper_loss=0.09113, over 2704627.72 frames. ], batch size: 91, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:36:08,464 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 9 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 11:36:08,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3868790.0, ans=0.0 2024-08-18 11:36:11,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-08-18 11:36:12,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3868890.0, ans=0.125 2024-08-18 11:36:16,289 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 11:36:24,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=3868990.0, ans=0.5 2024-08-18 11:36:37,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.252e+01 2.480e+01 2.798e+01 3.781e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-18 11:36:52,183 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 300, loss[loss=0.107, beats_loss=0.008762, ecapa_loss=0.0001232, whisper_loss=0.097, over 15715.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.009865, ecapa_loss=0.0001466, whisper_loss=0.09035, over 2940322.27 frames. ], batch size: 58, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:36:57,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.32 vs. limit=15.0 2024-08-18 11:37:03,019 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 11:37:05,711 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-18 11:37:15,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3869290.0, ans=0.125 2024-08-18 11:37:21,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-18 11:37:21,969 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 11:37:33,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3869490.0, ans=0.0 2024-08-18 11:37:37,603 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 11:38:00,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 350, loss[loss=0.09968, beats_loss=0.009659, ecapa_loss=0.0001387, whisper_loss=0.08863, over 19365.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009997, ecapa_loss=0.0001446, whisper_loss=0.09005, over 3104977.97 frames. ], batch size: 74, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:38:15,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3869790.0, ans=0.125 2024-08-18 11:38:22,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3869790.0, ans=0.0 2024-08-18 11:38:22,415 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.430e+01 2024-08-18 11:38:28,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3869890.0, ans=0.0 2024-08-18 11:38:32,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.34 vs. limit=22.5 2024-08-18 11:38:36,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3869890.0, ans=0.05 2024-08-18 11:38:46,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3869990.0, ans=0.1 2024-08-18 11:38:48,491 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 11:38:49,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=3869990.0, ans=0.2 2024-08-18 11:38:53,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.198e+01 2.411e+01 2.717e+01 4.096e+01, threshold=4.822e+01, percent-clipped=0.0 2024-08-18 11:38:58,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3870090.0, ans=0.0 2024-08-18 11:39:03,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3870090.0, ans=0.125 2024-08-18 11:39:07,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 400, loss[loss=0.1162, beats_loss=0.01109, ecapa_loss=0.0001019, whisper_loss=0.104, over 21381.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01006, ecapa_loss=0.0001447, whisper_loss=0.09064, over 3276985.22 frames. ], batch size: 81, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:39:15,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3870190.0, ans=0.0 2024-08-18 11:39:26,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3870290.0, ans=0.0 2024-08-18 11:39:28,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3870290.0, ans=0.0 2024-08-18 11:39:32,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3870290.0, ans=0.125 2024-08-18 11:39:35,011 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 11:39:41,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-18 11:39:42,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3870390.0, ans=0.2 2024-08-18 11:40:05,170 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 11:40:08,892 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 11:40:12,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=3870590.0, ans=0.2 2024-08-18 11:40:15,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 450, loss[loss=0.1092, beats_loss=0.009697, ecapa_loss=0.0001109, whisper_loss=0.09842, over 17490.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01011, ecapa_loss=0.0001451, whisper_loss=0.09045, over 3379895.39 frames. ], batch size: 64, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:40:24,612 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 19 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 11:40:43,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3870890.0, ans=0.0 2024-08-18 11:40:47,020 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 11:40:47,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3870890.0, ans=0.125 2024-08-18 11:41:04,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3870990.0, ans=0.1 2024-08-18 11:41:08,578 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.334e+01 2.651e+01 3.139e+01 3.582e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-18 11:41:20,938 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-18 11:41:23,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 500, loss[loss=0.1264, beats_loss=0.007504, ecapa_loss=0.0001231, whisper_loss=0.1176, over 19321.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01017, ecapa_loss=0.0001445, whisper_loss=0.09051, over 3485420.00 frames. ], batch size: 69, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:41:32,397 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.71 vs. limit=22.5 2024-08-18 11:41:36,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3871290.0, ans=0.125 2024-08-18 11:41:42,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3871290.0, ans=0.05 2024-08-18 11:41:56,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3871390.0, ans=0.125 2024-08-18 11:42:01,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3871490.0, ans=0.125 2024-08-18 11:42:21,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3871590.0, ans=0.2 2024-08-18 11:42:24,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3871590.0, ans=0.0 2024-08-18 11:42:25,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=12.0 2024-08-18 11:42:28,767 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 550, loss[loss=0.08882, beats_loss=0.01024, ecapa_loss=0.0001644, whisper_loss=0.07694, over 18002.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01018, ecapa_loss=0.0001452, whisper_loss=0.09048, over 3578450.53 frames. ], batch size: 77, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:42:59,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3871890.0, ans=0.125 2024-08-18 11:43:20,154 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.358e+01 2.658e+01 2.869e+01 1.652e+02, threshold=5.315e+01, percent-clipped=4.0 2024-08-18 11:43:22,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3872090.0, ans=0.1 2024-08-18 11:43:34,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 600, loss[loss=0.1013, beats_loss=0.01138, ecapa_loss=0.0001531, whisper_loss=0.08842, over 17533.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01023, ecapa_loss=0.0001441, whisper_loss=0.08979, over 3626269.07 frames. ], batch size: 72, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:43:43,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3872190.0, ans=0.125 2024-08-18 11:43:44,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3872190.0, ans=0.125 2024-08-18 11:44:00,395 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 11:44:22,472 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 11:44:22,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3872490.0, ans=0.0 2024-08-18 11:44:43,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 650, loss[loss=0.09922, beats_loss=0.01212, ecapa_loss=0.00015, whisper_loss=0.0856, over 20225.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01023, ecapa_loss=0.0001441, whisper_loss=0.09057, over 3677926.81 frames. ], batch size: 85, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:44:49,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3872690.0, ans=0.125 2024-08-18 11:45:27,942 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-18 11:45:29,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3872990.0, ans=0.05 2024-08-18 11:45:37,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.251e+01 2.525e+01 2.780e+01 3.539e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-18 11:45:38,604 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 11:45:51,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 700, loss[loss=0.09292, beats_loss=0.01171, ecapa_loss=0.000131, whisper_loss=0.07991, over 22074.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001435, whisper_loss=0.08976, over 3679335.52 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:46:05,851 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 11:46:21,824 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 11:46:23,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3873390.0, ans=0.125 2024-08-18 11:46:26,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3873390.0, ans=0.125 2024-08-18 11:46:27,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3873390.0, ans=0.125 2024-08-18 11:46:39,578 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 11:46:54,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3873590.0, ans=0.025 2024-08-18 11:46:56,334 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.731e-03 2024-08-18 11:46:59,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 750, loss[loss=0.06759, beats_loss=0.01059, ecapa_loss=0.0001678, whisper_loss=0.05532, over 14848.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01037, ecapa_loss=0.0001421, whisper_loss=0.08959, over 3736200.78 frames. ], batch size: 62, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:47:03,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.74 vs. limit=12.0 2024-08-18 11:47:24,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3873790.0, ans=0.2 2024-08-18 11:47:33,204 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=6.750e+01 2024-08-18 11:47:38,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3873890.0, ans=0.125 2024-08-18 11:47:38,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-18 11:47:44,539 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 19 from LS+wenet, 23 from Vox, 53 fro AS 2024-08-18 11:47:53,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.240e+01 2.488e+01 2.790e+01 6.250e+01, threshold=4.975e+01, percent-clipped=1.0 2024-08-18 11:47:54,871 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 11:48:01,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3874090.0, ans=0.07 2024-08-18 11:48:04,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2024-08-18 11:48:07,633 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 800, loss[loss=0.09633, beats_loss=0.01018, ecapa_loss=0.0001745, whisper_loss=0.08441, over 22335.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01046, ecapa_loss=0.0001429, whisper_loss=0.08843, over 3772869.36 frames. ], batch size: 91, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:48:08,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3874190.0, ans=0.1 2024-08-18 11:48:10,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2024-08-18 11:48:10,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.65 vs. limit=10.0 2024-08-18 11:48:17,636 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.69 vs. limit=15.0 2024-08-18 11:48:24,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.78 vs. limit=10.0 2024-08-18 11:48:26,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3874290.0, ans=0.125 2024-08-18 11:48:29,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3874290.0, ans=0.1 2024-08-18 11:48:42,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3874390.0, ans=0.125 2024-08-18 11:49:01,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3874490.0, ans=0.0 2024-08-18 11:49:10,695 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 11:49:17,197 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 850, loss[loss=0.1025, beats_loss=0.01204, ecapa_loss=0.0001321, whisper_loss=0.08916, over 22409.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001426, whisper_loss=0.08882, over 3789418.45 frames. ], batch size: 91, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:49:22,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=3874690.0, ans=0.2 2024-08-18 11:49:50,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3874890.0, ans=0.025 2024-08-18 11:49:57,654 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08740431070327759, model_norm_threshold=49.75291061401367 2024-08-18 11:49:57,819 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.157e+04, grad_sumsq=5.157e+04, orig_rms_sq=1.000e+00 2024-08-18 11:50:00,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3874990.0, ans=0.0 2024-08-18 11:50:07,594 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-18 11:50:11,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.23 vs. limit=10.0 2024-08-18 11:50:11,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.302e+01 2.593e+01 2.885e+01 5.692e+02, threshold=5.187e+01, percent-clipped=2.0 2024-08-18 11:50:14,435 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-18 11:50:18,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3875090.0, ans=0.125 2024-08-18 11:50:20,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2024-08-18 11:50:22,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3875090.0, ans=0.125 2024-08-18 11:50:27,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 900, loss[loss=0.1107, beats_loss=0.01033, ecapa_loss=0.0001536, whisper_loss=0.09884, over 22350.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01041, ecapa_loss=0.0001424, whisper_loss=0.08883, over 3800301.51 frames. ], batch size: 90, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:50:37,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3875190.0, ans=0.1 2024-08-18 11:50:37,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3875190.0, ans=0.0 2024-08-18 11:50:41,096 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 11:50:51,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3875290.0, ans=0.1 2024-08-18 11:50:53,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3875290.0, ans=0.2 2024-08-18 11:50:58,362 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 11:51:01,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.74 vs. limit=15.0 2024-08-18 11:51:16,062 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 11:51:30,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3875590.0, ans=0.125 2024-08-18 11:51:36,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-18 11:51:36,853 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 950, loss[loss=0.1133, beats_loss=0.009523, ecapa_loss=0.0001727, whisper_loss=0.102, over 17135.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.08861, over 3794845.26 frames. ], batch size: 71, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:51:51,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3875790.0, ans=0.125 2024-08-18 11:51:58,148 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.974e+01 2024-08-18 11:52:02,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3875790.0, ans=0.2 2024-08-18 11:52:13,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3875890.0, ans=0.0 2024-08-18 11:52:14,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=3875890.0, ans=0.025 2024-08-18 11:52:16,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3875890.0, ans=0.125 2024-08-18 11:52:28,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3875990.0, ans=0.1 2024-08-18 11:52:28,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3875990.0, ans=0.0 2024-08-18 11:52:30,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.296e+01 2.523e+01 2.746e+01 1.713e+02, threshold=5.046e+01, percent-clipped=1.0 2024-08-18 11:52:36,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3876090.0, ans=0.125 2024-08-18 11:52:46,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1000, loss[loss=0.1027, beats_loss=0.01072, ecapa_loss=0.0001248, whisper_loss=0.09069, over 22149.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0104, ecapa_loss=0.0001414, whisper_loss=0.08858, over 3805756.91 frames. ], batch size: 89, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:53:02,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=22.5 2024-08-18 11:53:08,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3876290.0, ans=0.125 2024-08-18 11:53:27,997 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 11:53:35,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3876490.0, ans=0.0 2024-08-18 11:53:37,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-18 11:53:43,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3876590.0, ans=0.125 2024-08-18 11:53:48,398 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 11:53:57,818 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1050, loss[loss=0.1055, beats_loss=0.01131, ecapa_loss=0.0001274, whisper_loss=0.0929, over 23080.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01039, ecapa_loss=0.0001423, whisper_loss=0.08883, over 3832814.44 frames. ], batch size: 92, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:54:10,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3876690.0, ans=0.125 2024-08-18 11:54:10,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3876690.0, ans=0.2 2024-08-18 11:54:32,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3876890.0, ans=0.125 2024-08-18 11:54:39,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=12.0 2024-08-18 11:54:53,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3876990.0, ans=0.0 2024-08-18 11:54:54,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.342e+01 2.695e+01 2.899e+01 4.255e+01, threshold=5.389e+01, percent-clipped=0.0 2024-08-18 11:55:02,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3877090.0, ans=0.1 2024-08-18 11:55:08,967 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 11:55:10,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1100, loss[loss=0.1043, beats_loss=0.00997, ecapa_loss=0.0001394, whisper_loss=0.09292, over 21000.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.000143, whisper_loss=0.08965, over 3849550.46 frames. ], batch size: 83, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:55:21,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3877190.0, ans=0.125 2024-08-18 11:55:53,371 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 11:56:23,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1150, loss[loss=0.0935, beats_loss=0.01323, ecapa_loss=0.0001255, whisper_loss=0.07902, over 19546.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01031, ecapa_loss=0.0001428, whisper_loss=0.08986, over 3838952.99 frames. ], batch size: 79, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:56:35,962 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2024-08-18 11:56:36,357 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 11:56:50,415 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 11:57:06,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3877990.0, ans=0.125 2024-08-18 11:57:14,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3877990.0, ans=0.125 2024-08-18 11:57:19,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.346e+01 2.566e+01 2.901e+01 4.362e+01, threshold=5.132e+01, percent-clipped=0.0 2024-08-18 11:57:34,937 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1200, loss[loss=0.09812, beats_loss=0.009426, ecapa_loss=0.0001546, whisper_loss=0.08715, over 16931.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01034, ecapa_loss=0.0001425, whisper_loss=0.08994, over 3800196.73 frames. ], batch size: 70, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:57:53,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3878290.0, ans=0.0 2024-08-18 11:57:54,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3878290.0, ans=0.2 2024-08-18 11:57:58,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3878290.0, ans=0.125 2024-08-18 11:58:12,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3878390.0, ans=0.0 2024-08-18 11:58:27,828 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 11:58:31,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=3878590.0, ans=0.125 2024-08-18 11:58:34,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3878590.0, ans=0.125 2024-08-18 11:58:46,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1250, loss[loss=0.08106, beats_loss=0.01328, ecapa_loss=0.0001487, whisper_loss=0.06629, over 13461.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.08898, over 3815350.70 frames. ], batch size: 54, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 11:58:51,652 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 11:58:52,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3878690.0, ans=0.035 2024-08-18 11:59:02,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3878790.0, ans=0.0 2024-08-18 11:59:14,720 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 11:59:35,016 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 11:59:36,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3878990.0, ans=0.125 2024-08-18 11:59:38,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2024-08-18 11:59:43,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.277e+01 2.561e+01 2.801e+01 1.202e+02, threshold=5.122e+01, percent-clipped=2.0 2024-08-18 11:59:50,920 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 11:59:55,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3879090.0, ans=0.0 2024-08-18 11:59:59,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1300, loss[loss=0.08147, beats_loss=0.01152, ecapa_loss=0.0001808, whisper_loss=0.06814, over 12910.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01048, ecapa_loss=0.0001422, whisper_loss=0.0894, over 3828095.53 frames. ], batch size: 56, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:00:21,592 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 12:00:22,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-08-18 12:00:24,556 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=8.784e+00 2024-08-18 12:00:25,968 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 12:00:34,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3879390.0, ans=0.125 2024-08-18 12:00:37,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3879390.0, ans=0.125 2024-08-18 12:00:44,738 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 12:00:47,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3879490.0, ans=0.1 2024-08-18 12:00:50,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3879490.0, ans=0.0 2024-08-18 12:00:55,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3879490.0, ans=0.0 2024-08-18 12:01:12,038 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1350, loss[loss=0.1222, beats_loss=0.007483, ecapa_loss=0.0001695, whisper_loss=0.113, over 15901.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001422, whisper_loss=0.0902, over 3858852.29 frames. ], batch size: 62, lr: 2.29e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:01:17,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3879690.0, ans=0.125 2024-08-18 12:01:24,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3879690.0, ans=0.1 2024-08-18 12:01:37,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3879790.0, ans=0.0 2024-08-18 12:01:49,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3879890.0, ans=0.0 2024-08-18 12:01:53,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3879890.0, ans=0.0 2024-08-18 12:02:10,709 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 12:02:15,305 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.267e+01 2.490e+01 2.849e+01 7.961e+01, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 12:02:26,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3880090.0, ans=0.0 2024-08-18 12:02:31,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1400, loss[loss=0.09017, beats_loss=0.0122, ecapa_loss=0.0001276, whisper_loss=0.0767, over 19288.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.08929, over 3859381.87 frames. ], batch size: 80, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:02:39,507 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 12:02:54,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=3880290.0, ans=22.5 2024-08-18 12:03:11,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=3880390.0, ans=0.05 2024-08-18 12:03:15,902 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:03:18,286 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 26 from LS+wenet, 6 from Vox, 33 fro AS 2024-08-18 12:03:19,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=12.0 2024-08-18 12:03:33,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3880490.0, ans=0.125 2024-08-18 12:03:41,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3880590.0, ans=0.125 2024-08-18 12:04:14,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1450, loss[loss=0.1155, beats_loss=0.009448, ecapa_loss=0.0001236, whisper_loss=0.1048, over 24451.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.08994, over 3846323.56 frames. ], batch size: 94, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:04:19,121 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 12:04:22,195 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 12:04:46,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=3880890.0, ans=0.2 2024-08-18 12:04:47,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=3880890.0, ans=0.125 2024-08-18 12:04:53,277 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 12:05:02,761 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 12:05:09,980 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 12:05:14,251 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.279e+01 2.479e+01 2.751e+01 4.183e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-18 12:05:30,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1500, loss[loss=0.08937, beats_loss=0.009334, ecapa_loss=0.0001398, whisper_loss=0.07864, over 23119.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.08931, over 3827495.09 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:05:31,516 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 12:05:32,902 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 12:05:35,860 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 12:05:44,410 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.49 vs. limit=22.5 2024-08-18 12:05:56,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3881290.0, ans=0.125 2024-08-18 12:06:26,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3881490.0, ans=0.1 2024-08-18 12:06:31,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.89 vs. limit=12.0 2024-08-18 12:06:44,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1550, loss[loss=0.09759, beats_loss=0.01166, ecapa_loss=0.0001263, whisper_loss=0.08467, over 17575.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.0001414, whisper_loss=0.08948, over 3810062.84 frames. ], batch size: 69, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:07:02,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3881790.0, ans=0.125 2024-08-18 12:07:15,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3881890.0, ans=0.04949747468305833 2024-08-18 12:07:21,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3881890.0, ans=0.125 2024-08-18 12:07:23,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-18 12:07:27,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3881890.0, ans=0.1 2024-08-18 12:07:31,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3881990.0, ans=0.1 2024-08-18 12:07:36,765 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 12:07:42,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3881990.0, ans=0.0 2024-08-18 12:07:45,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.226e+01 2.364e+01 2.655e+01 3.408e+01, threshold=4.729e+01, percent-clipped=0.0 2024-08-18 12:07:46,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3882090.0, ans=0.1 2024-08-18 12:07:47,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.06 vs. limit=12.0 2024-08-18 12:07:56,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3882090.0, ans=0.025 2024-08-18 12:08:00,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1600, loss[loss=0.09439, beats_loss=0.01039, ecapa_loss=0.0001092, whisper_loss=0.08291, over 16940.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001404, whisper_loss=0.08906, over 3801643.07 frames. ], batch size: 63, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:08:29,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3882290.0, ans=0.0 2024-08-18 12:08:38,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3882390.0, ans=0.0 2024-08-18 12:08:42,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3882390.0, ans=0.125 2024-08-18 12:08:46,194 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 12:08:51,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-08-18 12:09:06,907 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.135e+01 2024-08-18 12:09:16,014 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1650, loss[loss=0.08799, beats_loss=0.01279, ecapa_loss=0.0001045, whisper_loss=0.07416, over 14495.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08948, over 3807073.08 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:09:25,767 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 12:09:39,540 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 12:09:41,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3882790.0, ans=0.125 2024-08-18 12:09:47,274 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 12:10:13,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.497e+01 2.361e+01 2.617e+01 2.894e+01 3.984e+01, threshold=5.235e+01, percent-clipped=0.0 2024-08-18 12:10:16,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3883090.0, ans=0.125 2024-08-18 12:10:27,992 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1700, loss[loss=0.1003, beats_loss=0.01092, ecapa_loss=0.0001492, whisper_loss=0.08789, over 22728.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01034, ecapa_loss=0.0001408, whisper_loss=0.09024, over 3844695.26 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:10:50,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883290.0, ans=0.125 2024-08-18 12:10:55,261 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 12:11:09,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3883490.0, ans=0.2 2024-08-18 12:11:16,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3883490.0, ans=0.0 2024-08-18 12:11:17,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3883490.0, ans=0.125 2024-08-18 12:11:38,439 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1750, loss[loss=0.08179, beats_loss=0.0133, ecapa_loss=0.0001345, whisper_loss=0.06715, over 19270.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.09013, over 3860385.98 frames. ], batch size: 81, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:11:44,063 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 12:11:45,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3883690.0, ans=0.125 2024-08-18 12:11:46,167 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.12 vs. limit=6.0 2024-08-18 12:11:59,876 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 11 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 12:12:08,481 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 12:12:10,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3883890.0, ans=0.1 2024-08-18 12:12:11,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3883890.0, ans=0.0 2024-08-18 12:12:30,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3883990.0, ans=0.2 2024-08-18 12:12:31,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.82 vs. limit=15.0 2024-08-18 12:12:34,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.272e+01 2.549e+01 2.826e+01 1.079e+02, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 12:12:36,424 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 12:12:48,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1800, loss[loss=0.09082, beats_loss=0.01163, ecapa_loss=0.0001231, whisper_loss=0.07796, over 23157.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01028, ecapa_loss=0.000141, whisper_loss=0.09024, over 3817581.36 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:12:52,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3884190.0, ans=0.0 2024-08-18 12:12:53,898 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 12:13:07,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2024-08-18 12:13:13,785 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 12:13:18,359 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-18 12:13:31,159 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 12:13:42,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3884490.0, ans=0.125 2024-08-18 12:13:58,578 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1850, loss[loss=0.1152, beats_loss=0.01056, ecapa_loss=0.0001374, whisper_loss=0.1032, over 22460.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01029, ecapa_loss=0.0001413, whisper_loss=0.09029, over 3838209.63 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:14:00,721 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-08-18 12:14:01,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3884690.0, ans=0.125 2024-08-18 12:14:10,042 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 12:14:33,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3884890.0, ans=0.125 2024-08-18 12:14:43,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3884990.0, ans=0.0 2024-08-18 12:14:54,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.251e+01 2.486e+01 2.812e+01 3.198e+02, threshold=4.971e+01, percent-clipped=2.0 2024-08-18 12:14:56,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=12.0 2024-08-18 12:15:08,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1900, loss[loss=0.1236, beats_loss=0.009475, ecapa_loss=0.0001183, whisper_loss=0.1129, over 20486.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.0001407, whisper_loss=0.08981, over 3838677.60 frames. ], batch size: 76, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:15:10,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3885190.0, ans=0.0 2024-08-18 12:15:23,629 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 12:15:25,103 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 12:15:51,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3885490.0, ans=0.0 2024-08-18 12:15:52,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3885490.0, ans=0.125 2024-08-18 12:15:57,295 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 12:16:00,127 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 12:16:03,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3885590.0, ans=0.1 2024-08-18 12:16:17,368 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 1950, loss[loss=0.1083, beats_loss=0.008204, ecapa_loss=0.000143, whisper_loss=0.09869, over 15038.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.08953, over 3835725.83 frames. ], batch size: 56, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:16:25,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3885690.0, ans=0.0 2024-08-18 12:16:25,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-08-18 12:16:33,840 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.01 vs. limit=12.0 2024-08-18 12:16:55,501 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 12:17:06,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3885990.0, ans=0.125 2024-08-18 12:17:11,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3885990.0, ans=0.125 2024-08-18 12:17:14,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.298e+01 2.562e+01 2.933e+01 2.107e+02, threshold=5.124e+01, percent-clipped=3.0 2024-08-18 12:17:18,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3886090.0, ans=0.125 2024-08-18 12:17:26,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3886090.0, ans=0.125 2024-08-18 12:17:28,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3886190.0, ans=0.2 2024-08-18 12:17:29,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2000, loss[loss=0.1052, beats_loss=0.009712, ecapa_loss=0.0001474, whisper_loss=0.09406, over 16878.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01042, ecapa_loss=0.0001404, whisper_loss=0.08893, over 3841464.87 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:17:44,933 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 12:17:51,062 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 12:17:55,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=3886290.0, ans=0.0 2024-08-18 12:18:00,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-18 12:18:02,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3886390.0, ans=0.0 2024-08-18 12:18:13,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3886490.0, ans=0.0 2024-08-18 12:18:17,820 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 12:18:36,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=3886590.0, ans=0.05 2024-08-18 12:18:36,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3886590.0, ans=0.125 2024-08-18 12:18:40,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2050, loss[loss=0.1011, beats_loss=0.009558, ecapa_loss=0.0001615, whisper_loss=0.08991, over 22000.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01048, ecapa_loss=0.00014, whisper_loss=0.08828, over 3839872.88 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:19:00,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3886790.0, ans=0.2 2024-08-18 12:19:09,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3886890.0, ans=0.125 2024-08-18 12:19:36,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.270e+01 2.543e+01 2.879e+01 5.540e+01, threshold=5.086e+01, percent-clipped=1.0 2024-08-18 12:19:38,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3887090.0, ans=0.5 2024-08-18 12:19:40,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3887090.0, ans=0.125 2024-08-18 12:19:50,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2100, loss[loss=0.07279, beats_loss=0.01439, ecapa_loss=0.0001731, whisper_loss=0.05667, over 17886.00 frames. ], tot_loss[loss=0.09979, beats_loss=0.01056, ecapa_loss=0.0001387, whisper_loss=0.08785, over 3816485.44 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:19:58,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3887190.0, ans=0.1 2024-08-18 12:20:03,958 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 12:20:07,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3887290.0, ans=0.2 2024-08-18 12:20:25,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3887390.0, ans=0.1 2024-08-18 12:20:37,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3887490.0, ans=0.0 2024-08-18 12:20:41,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3887490.0, ans=0.125 2024-08-18 12:20:49,798 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 12:20:58,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=3887590.0, ans=6.0 2024-08-18 12:20:59,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2150, loss[loss=0.09857, beats_loss=0.01275, ecapa_loss=0.000161, whisper_loss=0.08421, over 18268.00 frames. ], tot_loss[loss=0.09977, beats_loss=0.01073, ecapa_loss=0.0001373, whisper_loss=0.08766, over 3828593.40 frames. ], batch size: 79, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:21:03,427 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 12:21:17,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3887790.0, ans=0.2 2024-08-18 12:21:24,331 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-18 12:21:54,792 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.65 vs. limit=15.0 2024-08-18 12:21:58,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.293e+01 2.488e+01 2.854e+01 6.681e+01, threshold=4.977e+01, percent-clipped=1.0 2024-08-18 12:22:10,766 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2024-08-18 12:22:12,560 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2200, loss[loss=0.09323, beats_loss=0.01008, ecapa_loss=0.0001376, whisper_loss=0.08178, over 15120.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01066, ecapa_loss=0.0001394, whisper_loss=0.08855, over 3846957.05 frames. ], batch size: 61, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:22:17,357 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 15 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 12:22:32,842 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 12:22:49,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2024-08-18 12:23:01,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3888490.0, ans=0.1 2024-08-18 12:23:23,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3888690.0, ans=0.1 2024-08-18 12:23:23,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2250, loss[loss=0.08611, beats_loss=0.01203, ecapa_loss=0.0001256, whisper_loss=0.07283, over 18006.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01071, ecapa_loss=0.0001395, whisper_loss=0.0887, over 3850159.84 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:23:24,474 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 12:23:43,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3888790.0, ans=0.5 2024-08-18 12:23:49,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3888790.0, ans=0.2 2024-08-18 12:23:53,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2024-08-18 12:24:19,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3888990.0, ans=0.0 2024-08-18 12:24:26,489 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.345e+01 2.551e+01 2.862e+01 1.228e+02, threshold=5.102e+01, percent-clipped=1.0 2024-08-18 12:24:37,670 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 12:24:42,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2300, loss[loss=0.09682, beats_loss=0.01031, ecapa_loss=0.0001461, whisper_loss=0.08505, over 18001.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01066, ecapa_loss=0.0001397, whisper_loss=0.08889, over 3830138.64 frames. ], batch size: 73, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:25:34,804 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.548e-03 2024-08-18 12:25:47,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3889590.0, ans=0.125 2024-08-18 12:25:49,561 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2024-08-18 12:26:00,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-18 12:26:01,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2350, loss[loss=0.116, beats_loss=0.008869, ecapa_loss=0.0001608, whisper_loss=0.1056, over 22350.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.0001417, whisper_loss=0.08929, over 3822512.02 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:26:01,748 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 12:26:04,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3889690.0, ans=0.95 2024-08-18 12:26:22,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3889790.0, ans=0.1 2024-08-18 12:26:27,062 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2024-08-18 12:26:29,244 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 12:26:38,677 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 12:26:52,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2024-08-18 12:27:04,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.417e+01 2.635e+01 3.019e+01 1.167e+02, threshold=5.271e+01, percent-clipped=1.0 2024-08-18 12:27:18,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2400, loss[loss=0.0886, beats_loss=0.0123, ecapa_loss=0.0001401, whisper_loss=0.0749, over 21190.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001421, whisper_loss=0.08951, over 3855540.48 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:27:28,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=3890190.0, ans=15.0 2024-08-18 12:27:38,787 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 12:27:41,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.22 vs. limit=10.0 2024-08-18 12:27:53,023 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 12:27:59,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.45 vs. limit=15.0 2024-08-18 12:28:03,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3890490.0, ans=0.1 2024-08-18 12:28:11,005 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 12:28:11,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3890490.0, ans=0.0 2024-08-18 12:28:30,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2450, loss[loss=0.08871, beats_loss=0.01124, ecapa_loss=0.0001367, whisper_loss=0.07611, over 21620.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.08941, over 3882003.56 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:28:33,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3890690.0, ans=0.125 2024-08-18 12:28:53,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3890790.0, ans=0.2 2024-08-18 12:28:54,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3890790.0, ans=0.0 2024-08-18 12:29:15,954 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 12:29:20,104 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 12:29:24,149 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 24 from Vox, 15 fro AS 2024-08-18 12:29:25,637 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 12:29:27,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.266e+01 2.434e+01 2.760e+01 4.670e+01, threshold=4.867e+01, percent-clipped=0.0 2024-08-18 12:29:31,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-08-18 12:29:36,647 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=3891090.0, ans=0.1 2024-08-18 12:29:38,861 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 18 from LS+wenet, 29 from Vox, 44 fro AS 2024-08-18 12:29:42,808 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2500, loss[loss=0.07056, beats_loss=0.01061, ecapa_loss=0.0001509, whisper_loss=0.05844, over 16173.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01044, ecapa_loss=0.0001433, whisper_loss=0.08982, over 3863899.67 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:29:50,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3891190.0, ans=0.0 2024-08-18 12:29:57,213 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 12:30:27,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3891490.0, ans=0.0 2024-08-18 12:30:33,836 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 12:30:51,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2550, loss[loss=0.1012, beats_loss=0.009911, ecapa_loss=0.000145, whisper_loss=0.08987, over 21689.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.08942, over 3868772.89 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:31:06,986 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 12:31:09,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3891790.0, ans=0.125 2024-08-18 12:31:13,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3891790.0, ans=0.0 2024-08-18 12:31:27,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3891890.0, ans=0.125 2024-08-18 12:31:43,211 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.325e+01 2.537e+01 2.934e+01 3.751e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 12:31:47,153 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 12:31:54,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3892190.0, ans=0.0 2024-08-18 12:31:54,918 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.650e+01 2024-08-18 12:31:55,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2600, loss[loss=0.08285, beats_loss=0.0102, ecapa_loss=0.000153, whisper_loss=0.07111, over 14675.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.08959, over 3847986.36 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:31:58,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3892190.0, ans=0.125 2024-08-18 12:32:02,190 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 12:32:57,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2650, loss[loss=0.08201, beats_loss=0.01476, ecapa_loss=0.0001429, whisper_loss=0.06581, over 18171.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.000142, whisper_loss=0.09032, over 3850718.37 frames. ], batch size: 78, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:33:13,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=3892790.0, ans=0.2 2024-08-18 12:33:34,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3892890.0, ans=0.0 2024-08-18 12:33:37,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3892990.0, ans=0.125 2024-08-18 12:33:48,373 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.994e+01 2.397e+01 2.618e+01 2.808e+01 4.074e+01, threshold=5.236e+01, percent-clipped=0.0 2024-08-18 12:33:51,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.78 vs. limit=10.0 2024-08-18 12:33:51,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.97 vs. limit=15.0 2024-08-18 12:33:58,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3893090.0, ans=0.0 2024-08-18 12:34:00,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2700, loss[loss=0.1041, beats_loss=0.0113, ecapa_loss=0.0001329, whisper_loss=0.09148, over 22195.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001416, whisper_loss=0.0896, over 3851207.46 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:34:14,769 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 27 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 12:34:16,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3893290.0, ans=0.2 2024-08-18 12:34:27,383 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-18 12:34:39,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3893490.0, ans=0.0 2024-08-18 12:34:39,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.90 vs. limit=22.5 2024-08-18 12:34:41,118 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 12:34:59,897 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 12:35:03,442 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2750, loss[loss=0.1026, beats_loss=0.01173, ecapa_loss=9.226e-05, whisper_loss=0.08993, over 21831.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.0892, over 3838933.51 frames. ], batch size: 79, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:35:05,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3893690.0, ans=0.0 2024-08-18 12:35:16,343 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 12:35:32,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3893890.0, ans=0.0 2024-08-18 12:35:34,019 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 12:35:38,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3893890.0, ans=0.125 2024-08-18 12:35:39,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3893890.0, ans=0.2 2024-08-18 12:35:48,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2024-08-18 12:35:53,876 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.272e+01 2.417e+01 2.652e+01 4.888e+01, threshold=4.834e+01, percent-clipped=0.0 2024-08-18 12:35:59,855 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 12:36:06,031 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2800, loss[loss=0.0925, beats_loss=0.01249, ecapa_loss=0.0001279, whisper_loss=0.07874, over 18615.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001416, whisper_loss=0.08993, over 3848471.48 frames. ], batch size: 77, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:36:19,877 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 12:36:21,138 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-18 12:36:25,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3894290.0, ans=0.125 2024-08-18 12:36:26,235 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 12:36:55,097 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 12:37:08,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2850, loss[loss=0.106, beats_loss=0.009898, ecapa_loss=0.0001758, whisper_loss=0.09434, over 20949.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.000142, whisper_loss=0.0898, over 3842127.70 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:37:10,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3894690.0, ans=0.2 2024-08-18 12:37:19,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3894790.0, ans=0.1 2024-08-18 12:37:32,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-08-18 12:37:32,903 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 12:37:57,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.309e+01 2.623e+01 2.921e+01 3.884e+01, threshold=5.245e+01, percent-clipped=0.0 2024-08-18 12:38:09,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2900, loss[loss=0.1017, beats_loss=0.007361, ecapa_loss=0.0001579, whisper_loss=0.09274, over 18312.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001419, whisper_loss=0.08978, over 3831793.20 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:38:11,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3895190.0, ans=0.0 2024-08-18 12:38:12,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3895190.0, ans=0.0 2024-08-18 12:38:37,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=3895390.0, ans=0.95 2024-08-18 12:38:56,398 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 12:39:04,174 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 12:39:10,089 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 12:39:10,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3895690.0, ans=0.125 2024-08-18 12:39:11,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 2950, loss[loss=0.0893, beats_loss=0.0118, ecapa_loss=0.0001362, whisper_loss=0.07613, over 22061.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001424, whisper_loss=0.09015, over 3852730.42 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:39:24,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3895790.0, ans=0.1 2024-08-18 12:39:25,288 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 12:39:27,817 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 12:39:29,600 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-18 12:39:30,295 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 15 from Vox, 51 fro AS 2024-08-18 12:39:42,733 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2024-08-18 12:39:53,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3895990.0, ans=0.1 2024-08-18 12:39:55,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3895990.0, ans=0.125 2024-08-18 12:40:01,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.273e+01 2.616e+01 2.939e+01 5.806e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-18 12:40:14,054 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.48 vs. limit=12.0 2024-08-18 12:40:14,609 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3000, loss[loss=0.1171, beats_loss=0.008301, ecapa_loss=0.0001281, whisper_loss=0.1075, over 19577.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0106, ecapa_loss=0.0001418, whisper_loss=0.09002, over 3888516.51 frames. ], batch size: 72, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:40:14,610 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 12:40:51,442 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.000526, whisper_loss=0.2482, over 922467.00 frames. 2024-08-18 12:41:08,057 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on SV_voxceleb1: loss=0.003954, beats_loss=0, ecapa_loss=0.0003954, whisper_loss=0, over 939242.00 frames. 2024-08-18 12:42:59,113 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 12:42:59,117 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 12:43:06,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-08-18 12:43:20,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-18 12:43:31,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3896390.0, ans=0.04949747468305833 2024-08-18 12:43:37,727 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 12:43:49,567 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:43:54,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.29 vs. limit=22.5 2024-08-18 12:43:55,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3896590.0, ans=0.125 2024-08-18 12:44:01,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3050, loss[loss=0.1122, beats_loss=0.01019, ecapa_loss=0.0001576, whisper_loss=0.1005, over 21461.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.000143, whisper_loss=0.09058, over 3901674.90 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:44:12,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3896790.0, ans=0.125 2024-08-18 12:44:13,991 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-18 12:44:31,227 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 12:44:51,086 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.420e+01 2.665e+01 2.949e+01 2.105e+02, threshold=5.329e+01, percent-clipped=1.0 2024-08-18 12:44:56,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.22 vs. limit=10.0 2024-08-18 12:44:58,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3897090.0, ans=0.0 2024-08-18 12:45:00,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3897090.0, ans=0.125 2024-08-18 12:45:03,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3100, loss[loss=0.1044, beats_loss=0.01272, ecapa_loss=0.0001457, whisper_loss=0.09024, over 19636.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01059, ecapa_loss=0.000144, whisper_loss=0.09085, over 3901853.60 frames. ], batch size: 81, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:45:15,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3897290.0, ans=0.2 2024-08-18 12:45:19,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2024-08-18 12:45:23,907 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 12:45:25,055 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-18 12:45:36,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3897390.0, ans=0.125 2024-08-18 12:45:58,600 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 12:46:06,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3150, loss[loss=0.1198, beats_loss=0.00954, ecapa_loss=0.0001332, whisper_loss=0.1089, over 23844.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01062, ecapa_loss=0.0001439, whisper_loss=0.09057, over 3893341.30 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:46:20,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.65 vs. limit=10.0 2024-08-18 12:46:21,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3897790.0, ans=0.125 2024-08-18 12:46:27,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-08-18 12:46:34,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=3897890.0, ans=0.025 2024-08-18 12:46:48,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3897990.0, ans=0.125 2024-08-18 12:46:54,326 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 12:46:56,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.328e+01 2.486e+01 2.838e+01 3.960e+01, threshold=4.973e+01, percent-clipped=0.0 2024-08-18 12:46:59,258 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 12:47:02,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3898090.0, ans=0.125 2024-08-18 12:47:08,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3200, loss[loss=0.09583, beats_loss=0.01098, ecapa_loss=0.0001085, whisper_loss=0.08376, over 17609.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.09179, over 3873094.94 frames. ], batch size: 67, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:47:13,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3898190.0, ans=0.1 2024-08-18 12:47:18,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3898190.0, ans=0.0 2024-08-18 12:47:23,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3898290.0, ans=0.125 2024-08-18 12:47:28,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.96 vs. limit=10.0 2024-08-18 12:47:30,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3898290.0, ans=0.125 2024-08-18 12:47:36,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=3898390.0, ans=0.05 2024-08-18 12:47:39,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3898390.0, ans=0.1 2024-08-18 12:47:41,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3898390.0, ans=0.125 2024-08-18 12:47:41,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3898390.0, ans=0.125 2024-08-18 12:47:47,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3898490.0, ans=0.05 2024-08-18 12:47:50,093 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 12:47:54,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3898490.0, ans=0.125 2024-08-18 12:48:08,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=3898590.0, ans=0.2 2024-08-18 12:48:11,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3250, loss[loss=0.09545, beats_loss=0.008971, ecapa_loss=0.0001892, whisper_loss=0.08459, over 20848.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01055, ecapa_loss=0.0001443, whisper_loss=0.09111, over 3869673.48 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:48:14,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3898690.0, ans=0.125 2024-08-18 12:48:30,212 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 12:48:31,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3898790.0, ans=0.125 2024-08-18 12:48:36,754 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 12:48:40,303 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 12:48:41,420 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 12:48:42,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=3898890.0, ans=10.0 2024-08-18 12:48:51,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3898990.0, ans=0.0 2024-08-18 12:49:00,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.280e+01 2.573e+01 2.891e+01 1.155e+02, threshold=5.145e+01, percent-clipped=3.0 2024-08-18 12:49:13,371 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3300, loss[loss=0.06994, beats_loss=0.01531, ecapa_loss=0.0001295, whisper_loss=0.05334, over 21098.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01069, ecapa_loss=0.0001448, whisper_loss=0.09038, over 3885370.49 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:49:13,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3899190.0, ans=0.125 2024-08-18 12:49:16,889 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 12:49:24,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3899290.0, ans=0.025 2024-08-18 12:49:29,332 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-18 12:49:32,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3899290.0, ans=0.125 2024-08-18 12:49:38,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.99 vs. limit=12.0 2024-08-18 12:49:41,716 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 12:49:45,371 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 12:49:59,151 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 12:50:15,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3350, loss[loss=0.1213, beats_loss=0.008629, ecapa_loss=0.0001559, whisper_loss=0.1111, over 23228.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01066, ecapa_loss=0.0001448, whisper_loss=0.09071, over 3913601.97 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:50:15,253 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 12:50:16,476 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 12:50:16,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3899690.0, ans=0.125 2024-08-18 12:50:19,903 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 12:50:31,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3899790.0, ans=0.125 2024-08-18 12:50:32,366 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 12:50:43,683 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 28 from LS+wenet, 29 from Vox, 39 fro AS 2024-08-18 12:50:46,168 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-18 12:50:49,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3899890.0, ans=0.125 2024-08-18 12:50:53,271 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 12:51:04,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.398e+01 2.647e+01 2.975e+01 4.321e+02, threshold=5.295e+01, percent-clipped=5.0 2024-08-18 12:51:11,664 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 12:51:11,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3900090.0, ans=0.0 2024-08-18 12:51:15,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3900190.0, ans=0.125 2024-08-18 12:51:16,219 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3400, loss[loss=0.1075, beats_loss=0.01075, ecapa_loss=0.0001224, whisper_loss=0.09555, over 14393.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001463, whisper_loss=0.09105, over 3906223.69 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:51:24,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-18 12:51:30,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3900290.0, ans=0.125 2024-08-18 12:51:46,325 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-18 12:51:56,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3900490.0, ans=0.5 2024-08-18 12:51:56,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3900490.0, ans=0.125 2024-08-18 12:52:06,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3900590.0, ans=0.0 2024-08-18 12:52:11,788 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 12:52:12,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3900590.0, ans=0.2 2024-08-18 12:52:16,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3450, loss[loss=0.1104, beats_loss=0.01003, ecapa_loss=0.0001483, whisper_loss=0.09892, over 17405.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01065, ecapa_loss=0.0001452, whisper_loss=0.08968, over 3917691.70 frames. ], batch size: 68, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:52:18,265 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 12:52:23,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3900690.0, ans=0.2 2024-08-18 12:52:33,079 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 12:52:46,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3900890.0, ans=0.0 2024-08-18 12:52:46,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3900890.0, ans=0.0 2024-08-18 12:52:47,356 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-18 12:53:08,508 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 12:53:09,720 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 12:53:10,845 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.235e+01 2.457e+01 2.725e+01 3.914e+01, threshold=4.915e+01, percent-clipped=0.0 2024-08-18 12:53:12,813 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 12:53:23,760 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 12:53:29,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3500, loss[loss=0.08966, beats_loss=0.009963, ecapa_loss=0.000163, whisper_loss=0.07806, over 22117.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001447, whisper_loss=0.09055, over 3922441.61 frames. ], batch size: 95, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:53:32,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3901190.0, ans=0.2 2024-08-18 12:53:36,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3901190.0, ans=0.125 2024-08-18 12:53:52,348 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 12:53:56,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3901290.0, ans=0.125 2024-08-18 12:53:57,373 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 12:54:03,597 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 12:54:16,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3901490.0, ans=0.0 2024-08-18 12:54:47,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3550, loss[loss=0.1017, beats_loss=0.008779, ecapa_loss=0.0001768, whisper_loss=0.0912, over 13887.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01057, ecapa_loss=0.0001437, whisper_loss=0.08958, over 3874162.13 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 1.152921504606847e+18 2024-08-18 12:54:47,566 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 13 from Vox, 41 fro AS 2024-08-18 12:54:57,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2024-08-18 12:55:12,354 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 12:55:22,567 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 12:55:26,810 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 12:55:36,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3901990.0, ans=0.1 2024-08-18 12:55:45,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3901990.0, ans=0.0 2024-08-18 12:55:50,573 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.348e+01 2.623e+01 2.938e+01 4.839e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-18 12:56:05,427 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3600, loss[loss=0.1079, beats_loss=0.00845, ecapa_loss=0.0001398, whisper_loss=0.0981, over 14410.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001429, whisper_loss=0.08968, over 3860442.20 frames. ], batch size: 55, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:56:07,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3902190.0, ans=0.125 2024-08-18 12:56:37,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3902390.0, ans=0.125 2024-08-18 12:56:37,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3902390.0, ans=0.04949747468305833 2024-08-18 12:56:39,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.45 vs. limit=10.0 2024-08-18 12:56:47,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3902390.0, ans=0.125 2024-08-18 12:56:54,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3902490.0, ans=0.1 2024-08-18 12:56:58,911 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 12:57:02,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.55 vs. limit=15.0 2024-08-18 12:57:25,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3902590.0, ans=0.1 2024-08-18 12:57:25,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3902590.0, ans=0.125 2024-08-18 12:57:30,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3650, loss[loss=0.08847, beats_loss=0.008749, ecapa_loss=0.000144, whisper_loss=0.07829, over 15331.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0106, ecapa_loss=0.0001418, whisper_loss=0.08976, over 3852396.55 frames. ], batch size: 62, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:57:36,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3902690.0, ans=0.125 2024-08-18 12:57:53,385 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-18 12:57:54,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3902890.0, ans=0.125 2024-08-18 12:58:15,451 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 12:58:21,440 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.261e+01 2.422e+01 2.681e+01 4.543e+01, threshold=4.845e+01, percent-clipped=0.0 2024-08-18 12:58:33,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3700, loss[loss=0.1014, beats_loss=0.009074, ecapa_loss=0.0001526, whisper_loss=0.09085, over 22033.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001433, whisper_loss=0.09024, over 3860077.31 frames. ], batch size: 86, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 12:58:39,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3903190.0, ans=0.0 2024-08-18 12:58:57,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3903290.0, ans=0.125 2024-08-18 12:59:04,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3903390.0, ans=0.125 2024-08-18 12:59:13,552 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 12:59:16,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3903490.0, ans=0.09899494936611666 2024-08-18 12:59:27,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-18 12:59:41,517 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3750, loss[loss=0.1062, beats_loss=0.008806, ecapa_loss=0.000164, whisper_loss=0.09579, over 18227.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.000144, whisper_loss=0.08988, over 3871824.38 frames. ], batch size: 73, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:00:01,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3903790.0, ans=0.125 2024-08-18 13:00:29,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3903990.0, ans=0.05 2024-08-18 13:00:41,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3903990.0, ans=0.0 2024-08-18 13:00:44,055 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 13:00:45,065 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.840e+01 2.286e+01 2.566e+01 2.907e+01 8.320e+01, threshold=5.133e+01, percent-clipped=1.0 2024-08-18 13:00:46,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3904090.0, ans=0.0 2024-08-18 13:00:53,124 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:00:58,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3904190.0, ans=0.125 2024-08-18 13:00:59,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3800, loss[loss=0.09989, beats_loss=0.008722, ecapa_loss=0.0001579, whisper_loss=0.08959, over 16705.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001453, whisper_loss=0.08959, over 3863539.55 frames. ], batch size: 65, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:01:04,380 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 13:01:11,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3904190.0, ans=0.125 2024-08-18 13:01:16,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3904290.0, ans=0.125 2024-08-18 13:01:17,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3904290.0, ans=0.125 2024-08-18 13:01:19,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=12.0 2024-08-18 13:02:06,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3904590.0, ans=0.07 2024-08-18 13:02:13,865 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3850, loss[loss=0.1085, beats_loss=0.01201, ecapa_loss=9.718e-05, whisper_loss=0.09547, over 16325.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01062, ecapa_loss=0.0001445, whisper_loss=0.08962, over 3860263.14 frames. ], batch size: 60, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:02:22,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3904690.0, ans=0.125 2024-08-18 13:02:22,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3904690.0, ans=0.125 2024-08-18 13:02:48,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3904890.0, ans=0.0 2024-08-18 13:02:54,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3904890.0, ans=0.0 2024-08-18 13:03:14,618 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.318e+01 2.599e+01 3.015e+01 2.305e+02, threshold=5.197e+01, percent-clipped=2.0 2024-08-18 13:03:15,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3905090.0, ans=0.125 2024-08-18 13:03:22,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3905090.0, ans=0.125 2024-08-18 13:03:28,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3900, loss[loss=0.1105, beats_loss=0.00909, ecapa_loss=0.0001452, whisper_loss=0.09995, over 19587.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001449, whisper_loss=0.08962, over 3839201.72 frames. ], batch size: 75, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:03:34,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-08-18 13:03:36,880 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 13:03:40,011 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 13:03:44,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3905290.0, ans=0.2 2024-08-18 13:03:53,425 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 13:03:59,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3905390.0, ans=0.125 2024-08-18 13:04:01,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3905390.0, ans=0.125 2024-08-18 13:04:07,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3905390.0, ans=0.1 2024-08-18 13:04:10,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3905390.0, ans=0.0 2024-08-18 13:04:11,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=3905390.0, ans=0.0 2024-08-18 13:04:22,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3905490.0, ans=0.125 2024-08-18 13:04:22,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=3905490.0, ans=0.2 2024-08-18 13:04:24,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.82 vs. limit=15.0 2024-08-18 13:04:36,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3905590.0, ans=0.0 2024-08-18 13:04:39,089 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 13:04:43,565 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 3950, loss[loss=0.09369, beats_loss=0.01182, ecapa_loss=0.0001329, whisper_loss=0.08054, over 21843.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001446, whisper_loss=0.08976, over 3818486.97 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:04:47,534 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 24 from LS+wenet, 28 from Vox, 44 fro AS 2024-08-18 13:05:26,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3905990.0, ans=0.015 2024-08-18 13:05:30,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3905990.0, ans=0.0 2024-08-18 13:05:33,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3905990.0, ans=0.125 2024-08-18 13:05:45,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.363e+01 2.538e+01 2.990e+01 4.778e+02, threshold=5.076e+01, percent-clipped=2.0 2024-08-18 13:05:50,882 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 13:05:58,268 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4000, loss[loss=0.09426, beats_loss=0.01231, ecapa_loss=0.0001274, whisper_loss=0.08067, over 24242.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001448, whisper_loss=0.08962, over 3860209.74 frames. ], batch size: 97, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:05:58,382 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 13:06:06,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3906190.0, ans=0.1 2024-08-18 13:06:10,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=3906190.0, ans=0.125 2024-08-18 13:06:10,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3906190.0, ans=0.2 2024-08-18 13:06:48,114 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 13:06:55,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.29 vs. limit=22.5 2024-08-18 13:06:58,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=3906590.0, ans=22.5 2024-08-18 13:07:10,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3906590.0, ans=0.0 2024-08-18 13:07:11,461 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 13:07:14,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4050, loss[loss=0.1039, beats_loss=0.008083, ecapa_loss=0.0001558, whisper_loss=0.09423, over 18765.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001455, whisper_loss=0.09036, over 3860415.04 frames. ], batch size: 70, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:07:24,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.03 vs. limit=22.5 2024-08-18 13:07:38,828 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 13:07:44,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3906890.0, ans=0.07 2024-08-18 13:07:48,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3906890.0, ans=0.125 2024-08-18 13:07:48,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=3906890.0, ans=10.0 2024-08-18 13:07:55,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3906890.0, ans=0.0 2024-08-18 13:08:01,465 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=3906990.0, ans=0.125 2024-08-18 13:08:15,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.302e+01 2.524e+01 2.887e+01 1.698e+02, threshold=5.047e+01, percent-clipped=2.0 2024-08-18 13:08:27,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2024-08-18 13:08:28,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4100, loss[loss=0.1221, beats_loss=0.007925, ecapa_loss=0.000164, whisper_loss=0.1126, over 19582.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001451, whisper_loss=0.09069, over 3849574.10 frames. ], batch size: 76, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:08:29,024 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 13:08:31,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3907190.0, ans=0.1 2024-08-18 13:08:34,568 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 13:08:44,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3907290.0, ans=0.1 2024-08-18 13:08:48,622 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-18 13:08:54,459 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 22 from Vox, 50 fro AS 2024-08-18 13:08:55,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3907290.0, ans=0.125 2024-08-18 13:08:59,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3907390.0, ans=0.125 2024-08-18 13:09:15,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3907490.0, ans=0.1 2024-08-18 13:09:16,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=3907490.0, ans=0.125 2024-08-18 13:09:16,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3907490.0, ans=0.125 2024-08-18 13:09:45,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-08-18 13:09:45,572 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4150, loss[loss=0.1171, beats_loss=0.009946, ecapa_loss=0.0001368, whisper_loss=0.1058, over 22041.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001457, whisper_loss=0.09074, over 3868357.64 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:09:53,020 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 13:10:04,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3907790.0, ans=0.0 2024-08-18 13:10:28,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3907990.0, ans=0.2 2024-08-18 13:10:32,553 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 13:10:44,882 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.199e+01 2.501e+01 2.835e+01 5.919e+01, threshold=5.001e+01, percent-clipped=1.0 2024-08-18 13:10:44,995 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 13:10:54,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3908090.0, ans=0.09899494936611666 2024-08-18 13:10:57,449 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 13:10:58,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4200, loss[loss=0.1183, beats_loss=0.01077, ecapa_loss=0.0001235, whisper_loss=0.1063, over 24505.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001445, whisper_loss=0.09084, over 3882530.55 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:11:00,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3908190.0, ans=0.0 2024-08-18 13:11:09,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3908190.0, ans=0.2 2024-08-18 13:11:17,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3908290.0, ans=0.125 2024-08-18 13:11:22,074 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2024-08-18 13:11:22,614 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 13:11:27,485 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 13:11:39,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.05 vs. limit=10.0 2024-08-18 13:11:55,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3908490.0, ans=0.125 2024-08-18 13:12:00,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3908590.0, ans=0.2 2024-08-18 13:12:08,513 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06405556201934814, model_norm_threshold=50.014076232910156 2024-08-18 13:12:08,688 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.868e+05, grad_sumsq=1.868e+05, orig_rms_sq=1.000e+00 2024-08-18 13:12:12,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2024-08-18 13:12:14,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4250, loss[loss=0.1144, beats_loss=0.009156, ecapa_loss=0.0001416, whisper_loss=0.1038, over 18279.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.09119, over 3897366.04 frames. ], batch size: 72, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:12:20,163 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 13:12:20,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3908690.0, ans=0.125 2024-08-18 13:12:26,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3908690.0, ans=0.0 2024-08-18 13:13:05,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3908990.0, ans=0.125 2024-08-18 13:13:15,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3909090.0, ans=0.125 2024-08-18 13:13:16,004 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 13:13:17,493 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.252e+01 2.550e+01 2.768e+01 7.808e+02, threshold=5.101e+01, percent-clipped=1.0 2024-08-18 13:13:31,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4300, loss[loss=0.1008, beats_loss=0.0111, ecapa_loss=0.0001529, whisper_loss=0.08818, over 22688.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01044, ecapa_loss=0.0001446, whisper_loss=0.09113, over 3877310.65 frames. ], batch size: 93, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:13:35,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3909190.0, ans=0.125 2024-08-18 13:13:36,857 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 13:13:40,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3909190.0, ans=0.0 2024-08-18 13:13:49,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3909290.0, ans=0.1 2024-08-18 13:13:51,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3909290.0, ans=0.0 2024-08-18 13:13:58,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3909290.0, ans=0.0 2024-08-18 13:14:07,486 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:14:23,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3909490.0, ans=0.125 2024-08-18 13:14:27,517 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:14:35,084 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 13:14:42,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3909590.0, ans=0.125 2024-08-18 13:14:48,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4350, loss[loss=0.1029, beats_loss=0.0112, ecapa_loss=0.0001315, whisper_loss=0.09041, over 22291.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01041, ecapa_loss=0.0001449, whisper_loss=0.09122, over 3863402.15 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:15:05,570 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 13:15:14,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.49 vs. limit=15.0 2024-08-18 13:15:14,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3909790.0, ans=0.125 2024-08-18 13:15:29,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=12.0 2024-08-18 13:15:32,082 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.64 vs. limit=15.0 2024-08-18 13:15:40,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3909990.0, ans=0.125 2024-08-18 13:15:51,272 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.300e+01 2.560e+01 2.936e+01 6.147e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 13:15:53,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3910090.0, ans=0.125 2024-08-18 13:15:53,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3910090.0, ans=0.2 2024-08-18 13:16:00,459 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 13:16:05,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4400, loss[loss=0.09086, beats_loss=0.01318, ecapa_loss=0.0001208, whisper_loss=0.07647, over 22178.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.0001442, whisper_loss=0.09114, over 3868564.18 frames. ], batch size: 89, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:16:05,556 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 13:16:07,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3910190.0, ans=0.125 2024-08-18 13:16:09,611 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-18 13:16:19,841 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 13:16:32,414 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03998252749443054, model_norm_threshold=51.19541549682617 2024-08-18 13:16:32,579 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.970e+05, grad_sumsq=3.970e+05, orig_rms_sq=1.000e+00 2024-08-18 13:16:37,267 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 13:17:02,996 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 12 from Vox, 45 fro AS 2024-08-18 13:17:16,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3910590.0, ans=0.1 2024-08-18 13:17:20,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3910590.0, ans=0.0 2024-08-18 13:17:21,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3910590.0, ans=0.125 2024-08-18 13:17:24,188 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4450, loss[loss=0.09968, beats_loss=0.01037, ecapa_loss=0.0001783, whisper_loss=0.08753, over 20966.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001453, whisper_loss=0.09066, over 3849224.28 frames. ], batch size: 90, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:17:30,970 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 13:17:37,451 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.70 vs. limit=6.0 2024-08-18 13:17:54,972 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.854e-01 2024-08-18 13:18:14,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3910990.0, ans=0.0 2024-08-18 13:18:24,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3910990.0, ans=10.0 2024-08-18 13:18:30,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.358e+01 2.721e+01 3.082e+01 1.280e+03, threshold=5.441e+01, percent-clipped=5.0 2024-08-18 13:18:35,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3911090.0, ans=0.1 2024-08-18 13:18:40,683 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 13:18:43,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4500, loss[loss=0.0986, beats_loss=0.01105, ecapa_loss=0.0001279, whisper_loss=0.08627, over 14861.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001453, whisper_loss=0.09076, over 3846727.58 frames. ], batch size: 57, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:18:46,477 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 13:19:08,902 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:19:15,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3911390.0, ans=0.125 2024-08-18 13:19:41,091 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:20:00,654 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4550, loss[loss=0.106, beats_loss=0.0111, ecapa_loss=0.0001584, whisper_loss=0.09334, over 16178.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001457, whisper_loss=0.09116, over 3880653.03 frames. ], batch size: 66, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:20:22,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2024-08-18 13:20:34,687 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 13:20:36,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3911890.0, ans=0.0 2024-08-18 13:20:49,505 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 34 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 13:21:03,586 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.646e+01 2.293e+01 2.530e+01 2.882e+01 1.902e+02, threshold=5.061e+01, percent-clipped=1.0 2024-08-18 13:21:09,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3912090.0, ans=0.0 2024-08-18 13:21:14,441 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 13:21:17,478 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4600, loss[loss=0.1089, beats_loss=0.009221, ecapa_loss=0.0001305, whisper_loss=0.09838, over 19592.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01036, ecapa_loss=0.0001457, whisper_loss=0.09181, over 3910126.84 frames. ], batch size: 72, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:21:30,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3912190.0, ans=0.125 2024-08-18 13:21:44,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3912290.0, ans=0.1 2024-08-18 13:21:49,387 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 13:22:04,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=3912490.0, ans=0.05 2024-08-18 13:22:08,390 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 13:22:19,060 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 13:22:30,694 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 13:22:33,920 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4650, loss[loss=0.1114, beats_loss=0.006413, ecapa_loss=0.0002074, whisper_loss=0.1029, over 16490.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001453, whisper_loss=0.09146, over 3894379.59 frames. ], batch size: 71, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:22:38,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3912690.0, ans=0.0 2024-08-18 13:22:46,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2024-08-18 13:22:54,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.62 vs. limit=22.5 2024-08-18 13:22:55,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3912790.0, ans=0.0 2024-08-18 13:23:35,605 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.611e+01 2.195e+01 2.467e+01 2.773e+01 3.878e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-18 13:23:49,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4700, loss[loss=0.1075, beats_loss=0.009133, ecapa_loss=0.0001457, whisper_loss=0.09692, over 22762.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01044, ecapa_loss=0.0001441, whisper_loss=0.09142, over 3901720.39 frames. ], batch size: 88, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:23:54,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3913190.0, ans=0.0 2024-08-18 13:24:27,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3913390.0, ans=0.125 2024-08-18 13:24:28,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3913390.0, ans=0.125 2024-08-18 13:24:32,755 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 13:24:33,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3913390.0, ans=0.125 2024-08-18 13:24:42,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3913490.0, ans=0.125 2024-08-18 13:24:51,324 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 13:24:51,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3913590.0, ans=0.0 2024-08-18 13:24:54,109 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 13:24:55,538 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 13:25:05,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4750, loss[loss=0.1016, beats_loss=0.01173, ecapa_loss=0.0001275, whisper_loss=0.08855, over 23220.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01044, ecapa_loss=0.0001437, whisper_loss=0.09147, over 3914334.23 frames. ], batch size: 91, lr: 2.28e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:25:07,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3913690.0, ans=0.0 2024-08-18 13:25:16,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3913690.0, ans=0.125 2024-08-18 13:25:18,864 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 13:25:24,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3913790.0, ans=0.125 2024-08-18 13:25:43,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3913890.0, ans=0.1 2024-08-18 13:25:51,522 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 13:26:04,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.30 vs. limit=15.0 2024-08-18 13:26:06,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3914090.0, ans=0.125 2024-08-18 13:26:07,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.285e+01 2.505e+01 2.813e+01 4.108e+01, threshold=5.010e+01, percent-clipped=0.0 2024-08-18 13:26:10,303 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2024-08-18 13:26:11,049 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 13:26:13,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2024-08-18 13:26:15,528 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 13:26:21,775 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4800, loss[loss=0.0981, beats_loss=0.01003, ecapa_loss=0.0001715, whisper_loss=0.08635, over 21397.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001435, whisper_loss=0.0912, over 3912001.80 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:27:01,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3914390.0, ans=0.125 2024-08-18 13:27:01,278 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-18 13:27:03,842 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.09 vs. limit=15.0 2024-08-18 13:27:21,397 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 13:27:26,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3914590.0, ans=0.125 2024-08-18 13:27:27,665 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 13:27:28,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3914590.0, ans=0.2 2024-08-18 13:27:37,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4850, loss[loss=0.07156, beats_loss=0.01191, ecapa_loss=0.0001199, whisper_loss=0.05845, over 19824.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01054, ecapa_loss=0.0001428, whisper_loss=0.09079, over 3903310.71 frames. ], batch size: 82, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:27:48,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3914690.0, ans=0.09899494936611666 2024-08-18 13:27:55,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3914790.0, ans=0.0 2024-08-18 13:27:56,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3914790.0, ans=0.125 2024-08-18 13:28:06,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3914890.0, ans=0.1 2024-08-18 13:28:18,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3914890.0, ans=0.125 2024-08-18 13:28:23,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3914990.0, ans=0.0 2024-08-18 13:28:35,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.142e+01 2.395e+01 2.645e+01 2.966e+01 4.545e+01, threshold=5.290e+01, percent-clipped=0.0 2024-08-18 13:28:37,297 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 13:28:47,430 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 13:28:48,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4900, loss[loss=0.08516, beats_loss=0.01213, ecapa_loss=0.0001525, whisper_loss=0.07151, over 21308.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001442, whisper_loss=0.09031, over 3869394.64 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:29:12,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=3915290.0, ans=0.125 2024-08-18 13:29:24,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3915390.0, ans=0.0 2024-08-18 13:29:36,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3915490.0, ans=0.125 2024-08-18 13:29:44,496 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 27 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-18 13:29:49,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3915590.0, ans=0.125 2024-08-18 13:30:00,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=3915590.0, ans=0.035 2024-08-18 13:30:05,164 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 4950, loss[loss=0.09299, beats_loss=0.008413, ecapa_loss=0.0001475, whisper_loss=0.0831, over 16108.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.08946, over 3849860.13 frames. ], batch size: 61, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:30:05,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3915690.0, ans=0.2 2024-08-18 13:30:21,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3915790.0, ans=0.125 2024-08-18 13:30:39,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3915890.0, ans=0.0 2024-08-18 13:30:42,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=3915890.0, ans=22.5 2024-08-18 13:30:49,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3915890.0, ans=0.0 2024-08-18 13:30:52,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-18 13:31:08,366 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.718e+01 2.264e+01 2.536e+01 2.797e+01 4.034e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 13:31:17,196 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-08-18 13:31:20,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=3916090.0, ans=0.0 2024-08-18 13:31:22,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5000, loss[loss=0.1165, beats_loss=0.008601, ecapa_loss=0.0001705, whisper_loss=0.1062, over 18120.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.08973, over 3842155.79 frames. ], batch size: 68, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:31:58,339 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 13:32:07,186 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-18 13:32:15,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.80 vs. limit=15.0 2024-08-18 13:32:20,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.90 vs. limit=15.0 2024-08-18 13:32:21,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=3916590.0, ans=0.0 2024-08-18 13:32:27,962 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 13:32:36,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5050, loss[loss=0.09802, beats_loss=0.009254, ecapa_loss=0.0001334, whisper_loss=0.08744, over 14811.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001445, whisper_loss=0.08951, over 3857417.16 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:32:46,046 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 13:32:55,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2024-08-18 13:33:02,889 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 13:33:09,622 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 13:33:16,810 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 13:33:27,754 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 13:33:30,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3916990.0, ans=0.125 2024-08-18 13:33:30,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3916990.0, ans=0.125 2024-08-18 13:33:37,477 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.311e+01 2.560e+01 2.884e+01 4.690e+01, threshold=5.121e+01, percent-clipped=0.0 2024-08-18 13:33:51,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5100, loss[loss=0.124, beats_loss=0.009257, ecapa_loss=0.0001467, whisper_loss=0.1132, over 23450.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001438, whisper_loss=0.08987, over 3865966.01 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:33:57,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3917190.0, ans=0.125 2024-08-18 13:34:09,269 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 13:34:09,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3917290.0, ans=0.1 2024-08-18 13:34:15,012 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 13:34:19,873 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 13:34:26,133 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 13:34:37,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2024-08-18 13:34:39,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3917490.0, ans=0.0 2024-08-18 13:34:46,390 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-18 13:34:49,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3917490.0, ans=0.125 2024-08-18 13:34:54,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3917590.0, ans=0.0 2024-08-18 13:34:55,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3917590.0, ans=0.125 2024-08-18 13:34:58,190 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 13:35:05,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3917590.0, ans=0.0 2024-08-18 13:35:07,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5150, loss[loss=0.1086, beats_loss=0.01174, ecapa_loss=0.0001293, whisper_loss=0.09558, over 15231.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01066, ecapa_loss=0.0001434, whisper_loss=0.08992, over 3875126.34 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:35:46,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3917890.0, ans=0.125 2024-08-18 13:35:46,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=3917890.0, ans=0.125 2024-08-18 13:35:55,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3917990.0, ans=0.125 2024-08-18 13:36:08,171 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.267e+01 2.541e+01 2.830e+01 4.847e+01, threshold=5.083e+01, percent-clipped=0.0 2024-08-18 13:36:21,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5200, loss[loss=0.09755, beats_loss=0.01259, ecapa_loss=0.0001229, whisper_loss=0.08373, over 19058.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01063, ecapa_loss=0.0001422, whisper_loss=0.08997, over 3844969.91 frames. ], batch size: 75, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:36:31,996 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 13:36:46,668 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 13:37:32,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2024-08-18 13:37:37,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3918590.0, ans=0.125 2024-08-18 13:37:40,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5250, loss[loss=0.1006, beats_loss=0.008961, ecapa_loss=0.0001699, whisper_loss=0.08996, over 16801.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01058, ecapa_loss=0.0001418, whisper_loss=0.09025, over 3859749.16 frames. ], batch size: 68, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:37:44,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3918690.0, ans=0.0 2024-08-18 13:38:00,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3918790.0, ans=0.0 2024-08-18 13:38:01,783 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 13:38:07,974 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 13:38:23,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3918890.0, ans=0.0 2024-08-18 13:38:30,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3918990.0, ans=0.125 2024-08-18 13:38:42,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.331e+01 2.617e+01 2.849e+01 4.827e+01, threshold=5.234e+01, percent-clipped=0.0 2024-08-18 13:38:43,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.28 vs. limit=22.5 2024-08-18 13:38:46,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.55 vs. limit=22.5 2024-08-18 13:38:49,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=3919090.0, ans=0.07 2024-08-18 13:38:53,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3919090.0, ans=0.125 2024-08-18 13:38:54,071 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=15.0 2024-08-18 13:38:55,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5300, loss[loss=0.0975, beats_loss=0.00791, ecapa_loss=0.0001689, whisper_loss=0.0879, over 13546.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001435, whisper_loss=0.09034, over 3855169.28 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:39:02,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=3919190.0, ans=0.0 2024-08-18 13:39:22,154 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 13:39:33,321 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 13:39:50,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3919490.0, ans=0.125 2024-08-18 13:39:53,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3919490.0, ans=0.05 2024-08-18 13:40:02,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3919590.0, ans=0.0 2024-08-18 13:40:08,313 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 13:40:13,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5350, loss[loss=0.1195, beats_loss=0.01029, ecapa_loss=0.0001437, whisper_loss=0.1078, over 23326.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.09118, over 3845783.23 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:40:18,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3919690.0, ans=0.1 2024-08-18 13:40:34,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3919790.0, ans=0.2 2024-08-18 13:40:45,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3919890.0, ans=0.125 2024-08-18 13:40:47,404 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 23 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-18 13:41:03,636 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 13:41:05,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3919990.0, ans=0.1 2024-08-18 13:41:14,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.871e+01 2.240e+01 2.441e+01 2.747e+01 4.165e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 13:41:25,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=3920090.0, ans=0.125 2024-08-18 13:41:27,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5400, loss[loss=0.09914, beats_loss=0.009751, ecapa_loss=0.0001405, whisper_loss=0.08798, over 19965.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01039, ecapa_loss=0.0001439, whisper_loss=0.09143, over 3856938.18 frames. ], batch size: 79, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:41:36,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3920190.0, ans=0.125 2024-08-18 13:42:04,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3920390.0, ans=0.04949747468305833 2024-08-18 13:42:08,378 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 13:42:19,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3920490.0, ans=0.025 2024-08-18 13:42:25,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3920590.0, ans=0.0 2024-08-18 13:42:36,648 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5450, loss[loss=0.1181, beats_loss=0.009078, ecapa_loss=0.0001428, whisper_loss=0.1076, over 18702.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01037, ecapa_loss=0.0001433, whisper_loss=0.09193, over 3869142.10 frames. ], batch size: 70, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:42:44,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=22.5 2024-08-18 13:42:56,267 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 31 from Vox, 38 fro AS 2024-08-18 13:43:03,224 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 13:43:09,142 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.78 vs. limit=15.0 2024-08-18 13:43:21,344 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-18 13:43:34,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.277e+01 2.488e+01 2.860e+01 4.810e+01, threshold=4.975e+01, percent-clipped=0.0 2024-08-18 13:43:45,729 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-18 13:43:48,619 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5500, loss[loss=0.1104, beats_loss=0.01063, ecapa_loss=0.0001429, whisper_loss=0.09837, over 15639.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001434, whisper_loss=0.09114, over 3887880.67 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:44:04,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.72 vs. limit=10.0 2024-08-18 13:44:12,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2024-08-18 13:44:27,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2024-08-18 13:44:41,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3921490.0, ans=0.125 2024-08-18 13:44:56,590 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 13:45:04,027 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5550, loss[loss=0.1052, beats_loss=0.008976, ecapa_loss=0.0001561, whisper_loss=0.09463, over 17479.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001441, whisper_loss=0.09091, over 3867243.21 frames. ], batch size: 67, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 13:45:04,191 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 13:45:15,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3921690.0, ans=0.0 2024-08-18 13:45:39,505 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 22 from Vox, 49 fro AS 2024-08-18 13:45:47,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3921890.0, ans=0.0 2024-08-18 13:45:49,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3921890.0, ans=0.125 2024-08-18 13:45:49,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3921890.0, ans=0.05 2024-08-18 13:46:11,125 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.872e+01 2.360e+01 2.583e+01 2.976e+01 1.161e+02, threshold=5.166e+01, percent-clipped=2.0 2024-08-18 13:46:13,194 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-18 13:46:13,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3922090.0, ans=0.125 2024-08-18 13:46:13,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=3922090.0, ans=0.125 2024-08-18 13:46:17,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3922090.0, ans=0.125 2024-08-18 13:46:25,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5600, loss[loss=0.0954, beats_loss=0.01296, ecapa_loss=0.0001257, whisper_loss=0.08118, over 21285.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001442, whisper_loss=0.09023, over 3874089.67 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:46:27,978 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0688062459230423, model_norm_threshold=51.66341781616211 2024-08-18 13:46:28,147 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.409e+04, grad_sumsq=9.409e+04, orig_rms_sq=1.000e+00 2024-08-18 13:46:33,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3922190.0, ans=0.0 2024-08-18 13:46:38,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3922190.0, ans=0.09899494936611666 2024-08-18 13:46:49,997 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 13:46:59,291 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 13:47:00,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3922390.0, ans=0.0 2024-08-18 13:47:09,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3922490.0, ans=0.05 2024-08-18 13:47:29,452 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 13:47:40,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-18 13:47:40,544 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5650, loss[loss=0.08524, beats_loss=0.0119, ecapa_loss=0.0001243, whisper_loss=0.0721, over 15900.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01073, ecapa_loss=0.0001433, whisper_loss=0.08874, over 3876306.38 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:48:15,213 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 13:48:20,596 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 28 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 13:48:22,293 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 13:48:45,597 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.361e+01 2.628e+01 2.997e+01 7.509e+02, threshold=5.255e+01, percent-clipped=3.0 2024-08-18 13:48:46,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-08-18 13:48:52,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3923090.0, ans=0.125 2024-08-18 13:48:57,922 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08530262112617493, model_norm_threshold=52.552433013916016 2024-08-18 13:48:58,095 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.807e+04, grad_sumsq=5.807e+04, orig_rms_sq=1.000e+00 2024-08-18 13:48:58,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5700, loss[loss=0.112, beats_loss=0.009441, ecapa_loss=0.0001196, whisper_loss=0.1014, over 15621.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01079, ecapa_loss=0.0001429, whisper_loss=0.08845, over 3876651.43 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:49:08,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3923190.0, ans=10.0 2024-08-18 13:49:08,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-08-18 13:49:10,769 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 13:49:35,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3923390.0, ans=0.0 2024-08-18 13:49:48,166 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 13:50:11,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3923590.0, ans=0.125 2024-08-18 13:50:13,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=3923590.0, ans=0.05 2024-08-18 13:50:15,614 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5750, loss[loss=0.07177, beats_loss=0.01344, ecapa_loss=0.0001364, whisper_loss=0.05697, over 17637.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001449, whisper_loss=0.08932, over 3869995.97 frames. ], batch size: 72, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:50:16,440 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.69 vs. limit=15.0 2024-08-18 13:50:35,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3923790.0, ans=0.1 2024-08-18 13:50:53,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.97 vs. limit=12.0 2024-08-18 13:51:06,906 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=12.0 2024-08-18 13:51:10,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3923990.0, ans=0.125 2024-08-18 13:51:16,767 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.280e+01 2.579e+01 2.799e+01 6.161e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 13:51:19,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3924090.0, ans=0.125 2024-08-18 13:51:28,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5800, loss[loss=0.1049, beats_loss=0.01124, ecapa_loss=0.0001247, whisper_loss=0.0924, over 18541.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01056, ecapa_loss=0.0001458, whisper_loss=0.08974, over 3861864.87 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:51:35,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3924190.0, ans=0.125 2024-08-18 13:51:39,650 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 13:52:07,355 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 13:52:10,889 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 30 from Vox, 27 fro AS 2024-08-18 13:52:28,780 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-18 13:52:39,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-18 13:52:41,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5850, loss[loss=0.1182, beats_loss=0.00944, ecapa_loss=0.0001566, whisper_loss=0.1072, over 22777.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001457, whisper_loss=0.08963, over 3875270.99 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:52:43,202 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 13:52:44,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3924690.0, ans=0.1 2024-08-18 13:53:15,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3924890.0, ans=0.125 2024-08-18 13:53:21,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3924890.0, ans=0.1 2024-08-18 13:53:30,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.66 vs. limit=15.0 2024-08-18 13:53:31,008 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 30 from Vox, 40 fro AS 2024-08-18 13:53:38,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=3924990.0, ans=0.125 2024-08-18 13:53:45,906 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.710e+01 2.307e+01 2.558e+01 2.890e+01 4.953e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 13:53:47,794 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-18 13:54:00,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5900, loss[loss=0.08586, beats_loss=0.01136, ecapa_loss=0.0001485, whisper_loss=0.07302, over 14246.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.000145, whisper_loss=0.08992, over 3879371.40 frames. ], batch size: 56, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:54:10,364 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 35 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 13:54:22,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3925290.0, ans=0.125 2024-08-18 13:54:25,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3925290.0, ans=0.2 2024-08-18 13:54:39,242 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 13:55:16,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3925590.0, ans=0.0 2024-08-18 13:55:16,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-08-18 13:55:20,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 5950, loss[loss=0.1152, beats_loss=0.008333, ecapa_loss=0.0001375, whisper_loss=0.1055, over 16624.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01065, ecapa_loss=0.0001448, whisper_loss=0.08919, over 3883960.29 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:55:24,013 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 25 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 13:55:49,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=3925790.0, ans=0.125 2024-08-18 13:55:51,077 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 13:55:56,570 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 11 from Vox, 25 fro AS 2024-08-18 13:56:03,992 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.000e+01 2024-08-18 13:56:04,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3925890.0, ans=0.0 2024-08-18 13:56:04,890 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 24 from LS+wenet, 15 from Vox, 17 fro AS 2024-08-18 13:56:16,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3925990.0, ans=0.0 2024-08-18 13:56:24,957 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.192e+01 2.474e+01 2.892e+01 4.028e+01, threshold=4.947e+01, percent-clipped=0.0 2024-08-18 13:56:38,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6000, loss[loss=0.09786, beats_loss=0.01098, ecapa_loss=0.000142, whisper_loss=0.08545, over 23315.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001442, whisper_loss=0.08994, over 3870847.10 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 13:56:38,136 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 13:57:15,601 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005218, whisper_loss=0.2477, over 922467.00 frames. 2024-08-18 13:57:34,364 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on SV_voxceleb1: loss=0.004081, beats_loss=0, ecapa_loss=0.0004081, whisper_loss=0, over 939242.00 frames. 2024-08-18 13:59:16,622 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on AT_audioset: loss=0.02317, beats_loss=0.02317, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 13:59:16,627 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 13:59:37,466 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 36 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 13:59:38,844 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 13:59:51,379 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 14:00:31,458 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:00:33,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6050, loss[loss=0.08326, beats_loss=0.008865, ecapa_loss=0.0001565, whisper_loss=0.07283, over 15650.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001436, whisper_loss=0.08991, over 3873028.47 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:00:40,127 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=12.0 2024-08-18 14:00:51,391 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 14:00:54,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.15 vs. limit=15.0 2024-08-18 14:01:16,076 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 14:01:18,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3926890.0, ans=0.125 2024-08-18 14:01:18,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3926890.0, ans=0.0 2024-08-18 14:01:30,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2024-08-18 14:01:33,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3926990.0, ans=0.125 2024-08-18 14:01:38,796 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.321e+01 2.592e+01 2.870e+01 3.846e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-18 14:01:43,040 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 14:01:49,565 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 14 from Vox, 20 fro AS 2024-08-18 14:01:53,344 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6100, loss[loss=0.09693, beats_loss=0.01021, ecapa_loss=0.0001147, whisper_loss=0.08557, over 16124.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001433, whisper_loss=0.08981, over 3860195.39 frames. ], batch size: 59, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:02:08,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3927290.0, ans=0.0 2024-08-18 14:02:36,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3927390.0, ans=0.0 2024-08-18 14:03:10,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6150, loss[loss=0.09168, beats_loss=0.01235, ecapa_loss=0.0001477, whisper_loss=0.07785, over 22526.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01065, ecapa_loss=0.0001429, whisper_loss=0.09012, over 3918951.42 frames. ], batch size: 93, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:03:15,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.36 vs. limit=15.0 2024-08-18 14:03:15,515 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 14:03:21,931 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 14:03:25,969 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 14:03:56,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3927990.0, ans=0.125 2024-08-18 14:04:11,607 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.379e+01 2.632e+01 2.794e+01 5.915e+01, threshold=5.263e+01, percent-clipped=2.0 2024-08-18 14:04:18,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3928090.0, ans=0.0 2024-08-18 14:04:18,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=3928090.0, ans=10.0 2024-08-18 14:04:24,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3928190.0, ans=0.1 2024-08-18 14:04:24,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6200, loss[loss=0.1079, beats_loss=0.01031, ecapa_loss=0.0001331, whisper_loss=0.09621, over 18483.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01067, ecapa_loss=0.0001427, whisper_loss=0.08999, over 3929768.09 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:04:26,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3928190.0, ans=0.2 2024-08-18 14:04:38,387 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 14:05:10,396 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 14:05:31,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3928590.0, ans=0.2 2024-08-18 14:05:43,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6250, loss[loss=0.08753, beats_loss=0.009245, ecapa_loss=0.0001767, whisper_loss=0.07652, over 15592.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01062, ecapa_loss=0.0001421, whisper_loss=0.08927, over 3913741.34 frames. ], batch size: 62, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:05:56,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3928690.0, ans=0.0 2024-08-18 14:06:03,432 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 14:06:04,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3928790.0, ans=0.125 2024-08-18 14:06:09,243 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 14:06:14,961 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 14:06:25,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3928890.0, ans=0.07 2024-08-18 14:06:28,135 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-18 14:06:46,535 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.311e+01 2.533e+01 2.797e+01 1.821e+02, threshold=5.065e+01, percent-clipped=2.0 2024-08-18 14:06:59,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6300, loss[loss=0.1307, beats_loss=0.009284, ecapa_loss=0.0001472, whisper_loss=0.1199, over 20432.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01058, ecapa_loss=0.0001434, whisper_loss=0.0895, over 3873072.10 frames. ], batch size: 78, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:07:06,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3929190.0, ans=0.125 2024-08-18 14:07:47,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3929490.0, ans=0.1 2024-08-18 14:07:57,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2024-08-18 14:07:59,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=12.0 2024-08-18 14:08:09,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3929590.0, ans=0.1 2024-08-18 14:08:10,288 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 40 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 14:08:13,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.57 vs. limit=22.5 2024-08-18 14:08:15,683 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6350, loss[loss=0.1109, beats_loss=0.009131, ecapa_loss=0.0001377, whisper_loss=0.1004, over 16419.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01061, ecapa_loss=0.000143, whisper_loss=0.08946, over 3847868.92 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:08:20,965 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 14:08:25,270 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 38 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-18 14:08:34,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3929790.0, ans=0.125 2024-08-18 14:08:41,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3929790.0, ans=0.125 2024-08-18 14:08:49,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=3929890.0, ans=0.125 2024-08-18 14:09:13,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3929990.0, ans=0.125 2024-08-18 14:09:13,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3929990.0, ans=0.1 2024-08-18 14:09:19,617 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.236e+01 2.432e+01 2.687e+01 3.502e+01, threshold=4.864e+01, percent-clipped=0.0 2024-08-18 14:09:31,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=3930190.0, ans=0.2 2024-08-18 14:09:32,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6400, loss[loss=0.1097, beats_loss=0.0089, ecapa_loss=0.0001941, whisper_loss=0.09888, over 18626.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001436, whisper_loss=0.08999, over 3850367.42 frames. ], batch size: 77, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:09:33,700 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 14:10:09,489 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 14:10:44,089 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 14:10:45,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6450, loss[loss=0.1008, beats_loss=0.01076, ecapa_loss=0.0001313, whisper_loss=0.08872, over 19397.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001445, whisper_loss=0.09083, over 3860034.39 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:10:45,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=3930690.0, ans=0.2 2024-08-18 14:10:47,850 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 14:11:07,587 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 14:11:24,334 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 14:11:28,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=3930990.0, ans=10.0 2024-08-18 14:11:34,159 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 14:11:37,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.963e+01 2.379e+01 2.625e+01 2.941e+01 1.011e+02, threshold=5.251e+01, percent-clipped=1.0 2024-08-18 14:11:44,040 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.03 vs. limit=10.0 2024-08-18 14:11:46,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3931090.0, ans=0.125 2024-08-18 14:11:48,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3931190.0, ans=0.0 2024-08-18 14:11:49,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6500, loss[loss=0.09849, beats_loss=0.01111, ecapa_loss=0.0001384, whisper_loss=0.086, over 19761.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001434, whisper_loss=0.09066, over 3888058.48 frames. ], batch size: 80, lr: 2.27e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:11:53,898 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 13 from Vox, 48 fro AS 2024-08-18 14:12:00,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3931190.0, ans=0.125 2024-08-18 14:12:01,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3931290.0, ans=0.125 2024-08-18 14:12:05,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3931290.0, ans=0.2 2024-08-18 14:12:20,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3931390.0, ans=0.125 2024-08-18 14:12:42,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.23 vs. limit=15.0 2024-08-18 14:12:52,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6550, loss[loss=0.09934, beats_loss=0.01119, ecapa_loss=0.0001613, whisper_loss=0.08653, over 20353.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01052, ecapa_loss=0.000144, whisper_loss=0.09121, over 3924310.54 frames. ], batch size: 86, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:12:55,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=3931690.0, ans=10.0 2024-08-18 14:12:56,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3931690.0, ans=0.0 2024-08-18 14:13:17,925 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 14:13:18,952 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08532015979290009, model_norm_threshold=52.50708770751953 2024-08-18 14:13:19,126 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.293e+04, grad_sumsq=4.293e+04, orig_rms_sq=1.000e+00 2024-08-18 14:13:30,544 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-18 14:13:30,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3931990.0, ans=0.07 2024-08-18 14:13:33,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3931990.0, ans=0.125 2024-08-18 14:13:36,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3931990.0, ans=0.0 2024-08-18 14:13:41,839 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 14:13:43,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=3932090.0, ans=0.125 2024-08-18 14:13:45,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.337e+01 2.574e+01 2.944e+01 6.154e+02, threshold=5.148e+01, percent-clipped=1.0 2024-08-18 14:13:47,782 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 14:13:55,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6600, loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001328, whisper_loss=0.08992, over 21014.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01044, ecapa_loss=0.0001438, whisper_loss=0.09221, over 3937388.83 frames. ], batch size: 83, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:13:59,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=3932190.0, ans=0.125 2024-08-18 14:14:05,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3932190.0, ans=0.04949747468305833 2024-08-18 14:14:09,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3932290.0, ans=0.2 2024-08-18 14:14:19,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3932390.0, ans=0.1 2024-08-18 14:14:34,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2024-08-18 14:14:34,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=3932490.0, ans=0.0 2024-08-18 14:14:39,300 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 14:14:56,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6650, loss[loss=0.07484, beats_loss=0.01132, ecapa_loss=0.0001396, whisper_loss=0.06212, over 15895.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001461, whisper_loss=0.09136, over 3947595.39 frames. ], batch size: 64, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:15:04,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=3932690.0, ans=0.05 2024-08-18 14:15:05,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=3932690.0, ans=0.05 2024-08-18 14:15:19,107 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 14:15:25,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.59 vs. limit=15.0 2024-08-18 14:15:31,784 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 14:15:49,081 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.331e+01 2.674e+01 2.933e+01 1.002e+02, threshold=5.348e+01, percent-clipped=1.0 2024-08-18 14:15:58,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6700, loss[loss=0.09186, beats_loss=0.006968, ecapa_loss=0.0001589, whisper_loss=0.08331, over 14415.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01038, ecapa_loss=0.0001451, whisper_loss=0.09176, over 3921071.07 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:16:01,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3933190.0, ans=0.1 2024-08-18 14:16:10,816 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 14:16:13,306 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-18 14:16:23,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3933390.0, ans=0.125 2024-08-18 14:16:27,082 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-18 14:16:29,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=3933390.0, ans=0.95 2024-08-18 14:16:37,216 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 14:16:38,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3933490.0, ans=0.1 2024-08-18 14:16:41,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3933490.0, ans=0.125 2024-08-18 14:16:59,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=3933590.0, ans=0.0 2024-08-18 14:16:59,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3933590.0, ans=0.1 2024-08-18 14:17:02,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6750, loss[loss=0.0862, beats_loss=0.01276, ecapa_loss=0.0001273, whisper_loss=0.07217, over 21912.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001463, whisper_loss=0.09153, over 3929722.37 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:17:10,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3933690.0, ans=0.125 2024-08-18 14:17:18,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2024-08-18 14:17:24,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3933790.0, ans=0.125 2024-08-18 14:17:38,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.12 vs. limit=10.0 2024-08-18 14:17:41,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3933990.0, ans=0.125 2024-08-18 14:17:54,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3934090.0, ans=0.0 2024-08-18 14:17:55,338 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.453e+01 2.659e+01 2.920e+01 3.778e+02, threshold=5.318e+01, percent-clipped=4.0 2024-08-18 14:18:05,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6800, loss[loss=0.1023, beats_loss=0.0119, ecapa_loss=0.0001114, whisper_loss=0.08929, over 18537.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001469, whisper_loss=0.09068, over 3925987.11 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:18:11,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3934190.0, ans=0.1 2024-08-18 14:18:14,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3934190.0, ans=0.125 2024-08-18 14:18:16,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3934190.0, ans=0.125 2024-08-18 14:18:45,616 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-08-18 14:18:58,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=3934590.0, ans=0.04949747468305833 2024-08-18 14:19:04,772 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 14:19:09,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6850, loss[loss=0.09544, beats_loss=0.01099, ecapa_loss=0.0001688, whisper_loss=0.08276, over 21662.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001465, whisper_loss=0.09055, over 3888980.24 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:19:27,284 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 14:19:32,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2024-08-18 14:19:33,587 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2024-08-18 14:19:54,999 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 12 from Vox, 35 fro AS 2024-08-18 14:20:02,975 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=5.152e-01 2024-08-18 14:20:03,730 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.935e+01 2.389e+01 2.625e+01 3.075e+01 4.351e+02, threshold=5.250e+01, percent-clipped=2.0 2024-08-18 14:20:04,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=3935090.0, ans=0.125 2024-08-18 14:20:10,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3935090.0, ans=0.125 2024-08-18 14:20:12,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3935090.0, ans=0.2 2024-08-18 14:20:14,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6900, loss[loss=0.08519, beats_loss=0.01276, ecapa_loss=0.0001577, whisper_loss=0.07085, over 20624.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001462, whisper_loss=0.09087, over 3897745.03 frames. ], batch size: 85, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:20:18,298 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 14:20:24,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.00 vs. limit=15.0 2024-08-18 14:20:28,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3935290.0, ans=0.0 2024-08-18 14:20:35,651 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 14:20:52,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=3935490.0, ans=0.2 2024-08-18 14:20:52,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3935490.0, ans=0.125 2024-08-18 14:21:00,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3935490.0, ans=0.09899494936611666 2024-08-18 14:21:03,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3935490.0, ans=0.125 2024-08-18 14:21:04,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2024-08-18 14:21:07,734 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 10 from Vox, 26 fro AS 2024-08-18 14:21:12,858 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-18 14:21:16,744 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 14:21:19,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 6950, loss[loss=0.111, beats_loss=0.01073, ecapa_loss=0.0001346, whisper_loss=0.09895, over 17185.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001459, whisper_loss=0.09116, over 3865468.38 frames. ], batch size: 66, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:21:29,560 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 14:21:29,874 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=12.0 2024-08-18 14:21:33,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3935790.0, ans=0.125 2024-08-18 14:21:36,728 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 37 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 14:21:48,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3935890.0, ans=0.125 2024-08-18 14:22:12,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.310e+01 2.521e+01 2.778e+01 4.175e+02, threshold=5.041e+01, percent-clipped=1.0 2024-08-18 14:22:13,435 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2024-08-18 14:22:23,214 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7000, loss[loss=0.09757, beats_loss=0.009722, ecapa_loss=0.00016, whisper_loss=0.08625, over 17019.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01043, ecapa_loss=0.0001459, whisper_loss=0.09169, over 3880222.02 frames. ], batch size: 69, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:22:46,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3936390.0, ans=0.0 2024-08-18 14:22:52,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=3936390.0, ans=0.07 2024-08-18 14:23:21,902 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 14:23:25,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7050, loss[loss=0.08311, beats_loss=0.01313, ecapa_loss=9.296e-05, whisper_loss=0.06905, over 18535.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001461, whisper_loss=0.09082, over 3907811.11 frames. ], batch size: 70, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:23:29,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3936690.0, ans=0.0 2024-08-18 14:23:32,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3936690.0, ans=0.125 2024-08-18 14:23:40,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3936790.0, ans=0.0 2024-08-18 14:23:44,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3936790.0, ans=0.0 2024-08-18 14:23:45,853 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 14:24:06,267 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:24:08,508 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 14:24:18,936 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.218e+01 2.427e+01 2.693e+01 4.080e+01, threshold=4.854e+01, percent-clipped=0.0 2024-08-18 14:24:24,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=3937090.0, ans=0.02 2024-08-18 14:24:28,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7100, loss[loss=0.09693, beats_loss=0.0113, ecapa_loss=0.0001457, whisper_loss=0.08416, over 14002.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001449, whisper_loss=0.09039, over 3897440.93 frames. ], batch size: 57, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:24:43,554 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 14:24:48,599 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 14:24:55,822 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 14:24:58,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=3937390.0, ans=0.125 2024-08-18 14:25:06,966 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-18 14:25:13,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3937490.0, ans=0.0 2024-08-18 14:25:18,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3937590.0, ans=0.0 2024-08-18 14:25:19,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3937590.0, ans=0.125 2024-08-18 14:25:29,412 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 14:25:30,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7150, loss[loss=0.09064, beats_loss=0.009416, ecapa_loss=0.0001417, whisper_loss=0.07981, over 15785.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.000144, whisper_loss=0.09002, over 3913625.74 frames. ], batch size: 61, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:25:56,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3937890.0, ans=0.0 2024-08-18 14:26:04,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-18 14:26:22,363 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.289e+01 2.542e+01 2.748e+01 4.524e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-18 14:26:32,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7200, loss[loss=0.1211, beats_loss=0.009859, ecapa_loss=0.0001283, whisper_loss=0.1099, over 23093.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001436, whisper_loss=0.0899, over 3915556.32 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:26:32,506 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 14:26:40,276 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-18 14:26:43,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3938290.0, ans=0.0 2024-08-18 14:26:43,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3938290.0, ans=0.125 2024-08-18 14:26:58,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3938390.0, ans=0.1 2024-08-18 14:27:06,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=3938390.0, ans=0.125 2024-08-18 14:27:09,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3938490.0, ans=0.0 2024-08-18 14:27:14,109 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 14:27:14,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3938490.0, ans=0.05 2024-08-18 14:27:17,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=12.0 2024-08-18 14:27:20,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.17 vs. limit=12.0 2024-08-18 14:27:33,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7250, loss[loss=0.09455, beats_loss=0.0115, ecapa_loss=0.0001144, whisper_loss=0.08191, over 22939.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001443, whisper_loss=0.09049, over 3927109.24 frames. ], batch size: 89, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:27:35,056 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 14:27:37,598 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 14:27:44,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3938690.0, ans=0.125 2024-08-18 14:27:44,983 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 14:27:50,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3938790.0, ans=0.1 2024-08-18 14:28:06,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.44 vs. limit=10.0 2024-08-18 14:28:07,892 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.628e+00 2024-08-18 14:28:10,113 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 14:28:13,773 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 14:28:16,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3938990.0, ans=0.0 2024-08-18 14:28:17,537 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 9 from Vox, 34 fro AS 2024-08-18 14:28:20,459 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:28:23,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.53 vs. limit=15.0 2024-08-18 14:28:25,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3939090.0, ans=0.0 2024-08-18 14:28:26,058 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.318e+01 2.608e+01 2.955e+01 6.690e+01, threshold=5.215e+01, percent-clipped=2.0 2024-08-18 14:28:27,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3939090.0, ans=0.1 2024-08-18 14:28:33,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3939090.0, ans=0.0 2024-08-18 14:28:35,870 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7300, loss[loss=0.09843, beats_loss=0.01187, ecapa_loss=0.000119, whisper_loss=0.08537, over 23869.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0105, ecapa_loss=0.0001431, whisper_loss=0.09121, over 3950032.11 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:28:37,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3939190.0, ans=0.0 2024-08-18 14:28:43,573 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 14:28:48,569 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 14:28:57,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=3939290.0, ans=0.0 2024-08-18 14:29:19,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3939490.0, ans=0.0 2024-08-18 14:29:22,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3939490.0, ans=0.125 2024-08-18 14:29:30,263 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 14:29:37,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7350, loss[loss=0.09605, beats_loss=0.01094, ecapa_loss=0.0001453, whisper_loss=0.08365, over 15887.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001434, whisper_loss=0.09103, over 3922061.16 frames. ], batch size: 64, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:29:45,067 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 14:30:11,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=3939890.0, ans=0.0 2024-08-18 14:30:11,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3939890.0, ans=0.1 2024-08-18 14:30:12,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3939890.0, ans=0.09899494936611666 2024-08-18 14:30:21,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3939990.0, ans=0.125 2024-08-18 14:30:27,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=3940090.0, ans=0.125 2024-08-18 14:30:29,945 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.344e+01 2.539e+01 2.800e+01 8.685e+01, threshold=5.077e+01, percent-clipped=1.0 2024-08-18 14:30:37,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3940090.0, ans=0.0 2024-08-18 14:30:39,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3940190.0, ans=0.125 2024-08-18 14:30:40,017 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7400, loss[loss=0.08531, beats_loss=0.01011, ecapa_loss=0.0001496, whisper_loss=0.0737, over 22227.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.09096, over 3897617.92 frames. ], batch size: 92, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:30:52,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3940290.0, ans=0.2 2024-08-18 14:30:53,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=3940290.0, ans=0.05 2024-08-18 14:30:59,887 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 14:31:03,384 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 14:31:35,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3940590.0, ans=0.0 2024-08-18 14:31:41,553 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7450, loss[loss=0.1431, beats_loss=0.006945, ecapa_loss=0.0001586, whisper_loss=0.1346, over 16993.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.000145, whisper_loss=0.09154, over 3907838.47 frames. ], batch size: 63, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:31:41,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=3940690.0, ans=0.125 2024-08-18 14:31:45,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3940690.0, ans=0.025 2024-08-18 14:31:46,684 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 11 from Vox, 31 fro AS 2024-08-18 14:31:50,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=3940690.0, ans=0.125 2024-08-18 14:31:53,634 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-18 14:31:54,135 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 14:31:56,497 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 14:32:09,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3940890.0, ans=0.0 2024-08-18 14:32:15,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3940890.0, ans=0.2 2024-08-18 14:32:16,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=3940890.0, ans=0.025 2024-08-18 14:32:33,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.745e+01 2.339e+01 2.554e+01 2.965e+01 5.290e+01, threshold=5.108e+01, percent-clipped=2.0 2024-08-18 14:32:37,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.15 vs. limit=12.0 2024-08-18 14:32:41,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3941090.0, ans=10.0 2024-08-18 14:32:43,009 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7500, loss[loss=0.1024, beats_loss=0.009754, ecapa_loss=0.0001481, whisper_loss=0.09121, over 15268.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001453, whisper_loss=0.09108, over 3888750.22 frames. ], batch size: 60, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:32:54,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3941290.0, ans=0.125 2024-08-18 14:32:55,067 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 19 from LS+wenet, 32 from Vox, 42 fro AS 2024-08-18 14:32:55,362 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:33:07,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=15.0 2024-08-18 14:33:10,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3941390.0, ans=0.0 2024-08-18 14:33:22,782 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.92 vs. limit=6.0 2024-08-18 14:33:29,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3941490.0, ans=0.0 2024-08-18 14:33:45,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3941590.0, ans=0.125 2024-08-18 14:33:47,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7550, loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001667, whisper_loss=0.09043, over 18231.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.000145, whisper_loss=0.09111, over 3887586.39 frames. ], batch size: 76, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:33:47,620 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 14:33:48,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3941690.0, ans=0.0 2024-08-18 14:33:52,230 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 14:34:11,982 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 18 from LS+wenet, 30 from Vox, 36 fro AS 2024-08-18 14:34:13,831 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 14:34:16,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2024-08-18 14:34:25,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3941890.0, ans=0.125 2024-08-18 14:34:30,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3941890.0, ans=0.0 2024-08-18 14:34:44,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3941990.0, ans=0.125 2024-08-18 14:34:53,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.267e+01 2.538e+01 2.826e+01 4.465e+01, threshold=5.075e+01, percent-clipped=0.0 2024-08-18 14:35:05,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7600, loss[loss=0.08314, beats_loss=0.01024, ecapa_loss=0.000186, whisper_loss=0.07104, over 12984.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.000146, whisper_loss=0.08993, over 3847672.69 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:35:24,440 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 14:35:32,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.29 vs. limit=22.5 2024-08-18 14:35:47,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3942390.0, ans=0.1 2024-08-18 14:35:51,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3942390.0, ans=0.09899494936611666 2024-08-18 14:35:55,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3942490.0, ans=0.125 2024-08-18 14:35:57,125 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 14:36:06,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3942490.0, ans=0.0 2024-08-18 14:36:32,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7650, loss[loss=0.1196, beats_loss=0.01071, ecapa_loss=0.0001732, whisper_loss=0.1071, over 22599.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001456, whisper_loss=0.08986, over 3873316.80 frames. ], batch size: 94, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:36:38,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3942690.0, ans=0.0 2024-08-18 14:37:07,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3942890.0, ans=0.125 2024-08-18 14:37:11,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3942890.0, ans=0.0 2024-08-18 14:37:19,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3942890.0, ans=0.125 2024-08-18 14:37:44,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.361e+01 2.566e+01 2.916e+01 1.165e+02, threshold=5.131e+01, percent-clipped=2.0 2024-08-18 14:37:54,954 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 22 from Vox, 14 fro AS 2024-08-18 14:37:58,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7700, loss[loss=0.1011, beats_loss=0.01096, ecapa_loss=0.0001084, whisper_loss=0.08907, over 14794.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01047, ecapa_loss=0.0001441, whisper_loss=0.08976, over 3858698.56 frames. ], batch size: 54, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:38:07,112 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 14:38:13,072 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 11 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 14:38:18,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3943290.0, ans=0.1 2024-08-18 14:38:42,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=3943490.0, ans=0.0 2024-08-18 14:38:43,277 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 30 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 14:38:53,578 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2024-08-18 14:39:01,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2024-08-18 14:39:03,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7750, loss[loss=0.1146, beats_loss=0.01248, ecapa_loss=0.0001239, whisper_loss=0.1008, over 22564.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001453, whisper_loss=0.09011, over 3850409.38 frames. ], batch size: 88, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:39:06,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3943690.0, ans=0.1 2024-08-18 14:39:16,816 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2024-08-18 14:39:24,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3943790.0, ans=0.125 2024-08-18 14:39:44,215 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-18 14:39:47,889 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-18 14:39:56,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.246e+01 2.508e+01 2.794e+01 3.157e+02, threshold=5.017e+01, percent-clipped=3.0 2024-08-18 14:39:56,353 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 14:40:06,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7800, loss[loss=0.1177, beats_loss=0.01029, ecapa_loss=0.000139, whisper_loss=0.1061, over 22154.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001453, whisper_loss=0.09038, over 3910239.60 frames. ], batch size: 91, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:40:07,572 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 14:40:08,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3944190.0, ans=0.0 2024-08-18 14:40:18,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3944290.0, ans=0.0 2024-08-18 14:40:25,308 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 14:40:29,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.12 vs. limit=6.0 2024-08-18 14:40:31,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3944390.0, ans=0.125 2024-08-18 14:40:36,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=3944390.0, ans=0.0 2024-08-18 14:40:42,604 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 14:40:52,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=3944490.0, ans=0.125 2024-08-18 14:41:07,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3944590.0, ans=0.0 2024-08-18 14:41:11,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7850, loss[loss=0.1125, beats_loss=0.01145, ecapa_loss=0.0001201, whisper_loss=0.0998, over 19195.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001451, whisper_loss=0.09066, over 3923745.00 frames. ], batch size: 75, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:41:30,747 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 14:41:31,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3944790.0, ans=0.125 2024-08-18 14:41:40,012 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 14:41:40,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3944890.0, ans=0.125 2024-08-18 14:42:09,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.337e+01 2.593e+01 2.821e+01 2.204e+02, threshold=5.186e+01, percent-clipped=1.0 2024-08-18 14:42:16,098 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 14:42:19,894 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7900, loss[loss=0.09645, beats_loss=0.01058, ecapa_loss=0.0001474, whisper_loss=0.08439, over 21726.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.000145, whisper_loss=0.09035, over 3922334.30 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:42:30,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=8.0 2024-08-18 14:42:35,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3945290.0, ans=0.125 2024-08-18 14:42:37,183 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 14:42:48,283 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2024-08-18 14:43:05,565 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 14:43:07,550 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=22.5 2024-08-18 14:43:11,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3945590.0, ans=0.125 2024-08-18 14:43:19,729 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 14:43:24,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 7950, loss[loss=0.1185, beats_loss=0.007794, ecapa_loss=0.0001815, whisper_loss=0.1088, over 20814.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.000146, whisper_loss=0.09097, over 3923998.75 frames. ], batch size: 82, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:43:32,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3945690.0, ans=0.0 2024-08-18 14:43:34,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3945690.0, ans=0.125 2024-08-18 14:43:36,564 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.13 vs. limit=10.0 2024-08-18 14:43:38,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3945790.0, ans=0.125 2024-08-18 14:43:45,885 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 14:43:49,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3945890.0, ans=0.1 2024-08-18 14:44:01,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-08-18 14:44:02,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=3945990.0, ans=0.0 2024-08-18 14:44:03,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2024-08-18 14:44:04,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=3945990.0, ans=0.0 2024-08-18 14:44:13,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.95 vs. limit=15.0 2024-08-18 14:44:18,407 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.738e+01 2.253e+01 2.458e+01 2.855e+01 4.177e+01, threshold=4.916e+01, percent-clipped=0.0 2024-08-18 14:44:21,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3946090.0, ans=0.0 2024-08-18 14:44:22,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3946090.0, ans=0.125 2024-08-18 14:44:22,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3946090.0, ans=0.1 2024-08-18 14:44:28,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8000, loss[loss=0.112, beats_loss=0.008016, ecapa_loss=0.0001647, whisper_loss=0.1023, over 22371.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001452, whisper_loss=0.09059, over 3899851.36 frames. ], batch size: 90, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:44:35,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-18 14:44:38,741 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 14:44:40,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3946290.0, ans=0.125 2024-08-18 14:44:53,478 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-18 14:45:01,186 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-08-18 14:45:03,049 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 14:45:16,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3946490.0, ans=0.2 2024-08-18 14:45:17,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3946490.0, ans=0.2 2024-08-18 14:45:17,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3946490.0, ans=0.0 2024-08-18 14:45:29,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2024-08-18 14:45:31,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8050, loss[loss=0.1099, beats_loss=0.008901, ecapa_loss=0.000121, whisper_loss=0.0998, over 14629.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000145, whisper_loss=0.09051, over 3888850.07 frames. ], batch size: 55, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:45:42,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3946690.0, ans=0.125 2024-08-18 14:45:48,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3946790.0, ans=0.125 2024-08-18 14:45:48,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3946790.0, ans=0.125 2024-08-18 14:45:50,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3946790.0, ans=0.125 2024-08-18 14:45:51,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3946790.0, ans=0.1 2024-08-18 14:46:14,625 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 14:46:22,769 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.794e-02 2024-08-18 14:46:24,657 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.326e+01 2.639e+01 3.174e+01 1.521e+02, threshold=5.277e+01, percent-clipped=3.0 2024-08-18 14:46:28,472 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 14:46:35,146 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8100, loss[loss=0.09879, beats_loss=0.009502, ecapa_loss=0.0001771, whisper_loss=0.08752, over 14110.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001445, whisper_loss=0.08993, over 3886367.97 frames. ], batch size: 58, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:46:38,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=3947190.0, ans=0.125 2024-08-18 14:47:14,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3947390.0, ans=0.125 2024-08-18 14:47:14,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=3947390.0, ans=0.125 2024-08-18 14:47:17,705 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 14:47:23,017 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 14:47:27,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.96 vs. limit=22.5 2024-08-18 14:47:37,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=15.0 2024-08-18 14:47:41,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8150, loss[loss=0.1052, beats_loss=0.009937, ecapa_loss=0.0001416, whisper_loss=0.09384, over 17547.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001451, whisper_loss=0.09001, over 3865967.94 frames. ], batch size: 67, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:47:49,393 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 14:47:57,050 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 14:48:04,509 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-18 14:48:31,259 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 14:48:35,010 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.970e+01 2.257e+01 2.558e+01 2.766e+01 4.647e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 14:48:40,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-18 14:48:45,285 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8200, loss[loss=0.107, beats_loss=0.007797, ecapa_loss=0.0001405, whisper_loss=0.09783, over 19293.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001456, whisper_loss=0.09033, over 3868135.66 frames. ], batch size: 74, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:48:47,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3948190.0, ans=0.0 2024-08-18 14:49:19,843 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 40 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-18 14:49:22,090 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 19 from LS+wenet, 25 from Vox, 48 fro AS 2024-08-18 14:49:30,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.45 vs. limit=22.5 2024-08-18 14:49:31,199 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 14:49:33,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-08-18 14:49:49,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8250, loss[loss=0.1004, beats_loss=0.0112, ecapa_loss=0.0001416, whisper_loss=0.08775, over 17633.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01049, ecapa_loss=0.0001454, whisper_loss=0.0895, over 3896549.77 frames. ], batch size: 71, lr: 2.27e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:49:53,531 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-18 14:50:19,807 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 14:50:20,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=3948890.0, ans=0.0 2024-08-18 14:50:26,255 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 14:50:44,686 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.276e+01 2.490e+01 2.765e+01 6.193e+01, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 14:50:46,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3949090.0, ans=0.125 2024-08-18 14:50:54,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8300, loss[loss=0.1058, beats_loss=0.01137, ecapa_loss=0.0001198, whisper_loss=0.09323, over 22476.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001438, whisper_loss=0.089, over 3887306.73 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:50:59,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3949190.0, ans=0.125 2024-08-18 14:51:06,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3949290.0, ans=0.1 2024-08-18 14:51:13,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3949290.0, ans=0.2 2024-08-18 14:51:19,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3949390.0, ans=0.0 2024-08-18 14:51:20,882 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 14:51:23,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=3949390.0, ans=0.125 2024-08-18 14:51:30,412 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.49 vs. limit=10.0 2024-08-18 14:51:57,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8350, loss[loss=0.1144, beats_loss=0.008354, ecapa_loss=0.0001807, whisper_loss=0.1043, over 16292.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001455, whisper_loss=0.08954, over 3905803.30 frames. ], batch size: 67, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:51:58,600 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 14:52:12,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=3949790.0, ans=0.125 2024-08-18 14:52:21,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3949890.0, ans=0.0 2024-08-18 14:52:29,383 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 14:52:31,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2024-08-18 14:52:51,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2024-08-18 14:52:52,732 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.289e+01 2.579e+01 2.822e+01 1.067e+02, threshold=5.159e+01, percent-clipped=1.0 2024-08-18 14:52:56,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3950090.0, ans=0.125 2024-08-18 14:53:04,044 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8400, loss[loss=0.1063, beats_loss=0.009336, ecapa_loss=0.0001702, whisper_loss=0.09527, over 17464.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001456, whisper_loss=0.08946, over 3901887.79 frames. ], batch size: 70, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:53:04,209 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-18 14:53:06,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3950190.0, ans=0.125 2024-08-18 14:53:13,666 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-18 14:53:18,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3950290.0, ans=0.05 2024-08-18 14:53:28,911 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 14:53:32,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3950390.0, ans=0.0 2024-08-18 14:53:38,314 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 14:54:05,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3950590.0, ans=0.125 2024-08-18 14:54:10,526 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8450, loss[loss=0.1185, beats_loss=0.006733, ecapa_loss=0.0001589, whisper_loss=0.1102, over 15369.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001465, whisper_loss=0.09001, over 3891940.92 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:54:12,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=15.0 2024-08-18 14:54:18,233 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 14:54:18,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3950690.0, ans=0.1 2024-08-18 14:54:23,468 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09940025955438614, model_norm_threshold=51.58603286743164 2024-08-18 14:54:23,640 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.828e+04, grad_sumsq=5.828e+04, orig_rms_sq=1.000e+00 2024-08-18 14:54:41,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3950890.0, ans=0.1 2024-08-18 14:54:42,065 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 14:54:47,008 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-18 14:54:51,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3950990.0, ans=0.125 2024-08-18 14:54:58,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=3950990.0, ans=0.0 2024-08-18 14:55:01,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3951090.0, ans=0.125 2024-08-18 14:55:02,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-18 14:55:04,301 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.912e+01 2.244e+01 2.462e+01 2.718e+01 5.190e+02, threshold=4.924e+01, percent-clipped=1.0 2024-08-18 14:55:09,018 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-18 14:55:11,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3951090.0, ans=0.95 2024-08-18 14:55:12,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3951090.0, ans=0.0 2024-08-18 14:55:14,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8500, loss[loss=0.1123, beats_loss=0.007375, ecapa_loss=0.0001681, whisper_loss=0.1032, over 22948.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001461, whisper_loss=0.09012, over 3893965.11 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:55:18,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3951190.0, ans=0.125 2024-08-18 14:55:32,021 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 14:55:38,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=3951390.0, ans=0.2 2024-08-18 14:55:42,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3951390.0, ans=0.1 2024-08-18 14:55:45,357 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.67 vs. limit=22.5 2024-08-18 14:55:50,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3951390.0, ans=0.125 2024-08-18 14:55:53,645 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 14:55:55,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3951490.0, ans=0.1 2024-08-18 14:56:16,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8550, loss[loss=0.114, beats_loss=0.01011, ecapa_loss=0.0001291, whisper_loss=0.1026, over 23722.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001454, whisper_loss=0.08996, over 3891834.54 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 14:56:27,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3951690.0, ans=0.125 2024-08-18 14:56:52,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=3951890.0, ans=0.125 2024-08-18 14:57:10,811 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.534e+01 2.714e+01 3.031e+01 4.468e+01, threshold=5.428e+01, percent-clipped=0.0 2024-08-18 14:57:19,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8600, loss[loss=0.1073, beats_loss=0.009153, ecapa_loss=0.0001336, whisper_loss=0.09685, over 19604.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000144, whisper_loss=0.09051, over 3887811.62 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:57:20,919 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 14:57:27,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3952190.0, ans=0.0 2024-08-18 14:57:59,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3952490.0, ans=0.1 2024-08-18 14:58:01,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3952490.0, ans=0.07 2024-08-18 14:58:11,048 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 14:58:21,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8650, loss[loss=0.102, beats_loss=0.009643, ecapa_loss=0.0001977, whisper_loss=0.0904, over 14303.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001443, whisper_loss=0.09032, over 3845344.27 frames. ], batch size: 62, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:58:33,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3952790.0, ans=0.0 2024-08-18 14:58:37,612 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 14:58:39,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3952790.0, ans=0.0 2024-08-18 14:58:46,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-18 14:58:46,189 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=15.0 2024-08-18 14:58:55,144 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 47 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 14:58:55,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3952890.0, ans=0.1 2024-08-18 14:59:08,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3952990.0, ans=0.0 2024-08-18 14:59:11,699 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3953090.0, ans=0.125 2024-08-18 14:59:15,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.252e+01 2.415e+01 2.736e+01 4.412e+01, threshold=4.831e+01, percent-clipped=0.0 2024-08-18 14:59:15,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3953090.0, ans=0.0 2024-08-18 14:59:19,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3953090.0, ans=0.1 2024-08-18 14:59:23,848 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8700, loss[loss=0.09829, beats_loss=0.01187, ecapa_loss=0.0001315, whisper_loss=0.08511, over 17069.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01048, ecapa_loss=0.0001445, whisper_loss=0.09001, over 3854614.47 frames. ], batch size: 70, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 14:59:25,273 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 14:59:28,790 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 14:59:35,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3953290.0, ans=0.125 2024-08-18 14:59:41,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3953290.0, ans=0.125 2024-08-18 14:59:47,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2024-08-18 15:00:18,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3953590.0, ans=0.125 2024-08-18 15:00:21,076 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 15:00:25,990 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8750, loss[loss=0.08985, beats_loss=0.01355, ecapa_loss=0.0001018, whisper_loss=0.07529, over 22886.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001438, whisper_loss=0.09005, over 3833542.63 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:00:42,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3953790.0, ans=0.0 2024-08-18 15:00:42,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.46 vs. limit=15.0 2024-08-18 15:00:47,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3953790.0, ans=0.2 2024-08-18 15:00:48,720 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:00:53,309 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 15:00:54,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3953890.0, ans=0.0 2024-08-18 15:01:07,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3953990.0, ans=0.2 2024-08-18 15:01:07,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-18 15:01:19,353 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.312e+01 2.474e+01 2.763e+01 1.199e+02, threshold=4.947e+01, percent-clipped=1.0 2024-08-18 15:01:28,078 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8800, loss[loss=0.09768, beats_loss=0.01192, ecapa_loss=0.0001262, whisper_loss=0.08449, over 18391.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01052, ecapa_loss=0.0001426, whisper_loss=0.0905, over 3809667.36 frames. ], batch size: 73, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:01:38,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3954190.0, ans=0.1 2024-08-18 15:01:58,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.30 vs. limit=15.0 2024-08-18 15:02:00,205 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 13 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 15:02:23,346 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 15:02:29,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3954690.0, ans=0.125 2024-08-18 15:02:30,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8850, loss[loss=0.07833, beats_loss=0.0143, ecapa_loss=0.0001028, whisper_loss=0.06301, over 15796.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.08984, over 3839744.55 frames. ], batch size: 60, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:02:51,603 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04036853462457657, model_norm_threshold=49.47042465209961 2024-08-18 15:02:51,772 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.484e+05, grad_sumsq=1.484e+05, orig_rms_sq=1.000e+00 2024-08-18 15:02:57,418 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 15:03:04,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=3954890.0, ans=0.05 2024-08-18 15:03:14,819 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 15:03:15,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3954990.0, ans=0.125 2024-08-18 15:03:24,438 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.312e+01 2.617e+01 3.003e+01 1.225e+03, threshold=5.234e+01, percent-clipped=1.0 2024-08-18 15:03:29,586 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 15:03:31,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3955090.0, ans=0.125 2024-08-18 15:03:33,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8900, loss[loss=0.1007, beats_loss=0.01011, ecapa_loss=0.0001388, whisper_loss=0.08925, over 22465.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001436, whisper_loss=0.08954, over 3872777.71 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:03:38,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3955190.0, ans=0.125 2024-08-18 15:03:47,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3955290.0, ans=0.0 2024-08-18 15:03:52,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3955290.0, ans=0.0 2024-08-18 15:03:54,508 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 9 from Vox, 36 fro AS 2024-08-18 15:04:05,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2024-08-18 15:04:15,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=3955490.0, ans=0.2 2024-08-18 15:04:21,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3955490.0, ans=0.125 2024-08-18 15:04:35,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 8950, loss[loss=0.1356, beats_loss=0.007747, ecapa_loss=0.000137, whisper_loss=0.1265, over 22957.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.000143, whisper_loss=0.09001, over 3849373.13 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:05:28,575 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 14 from Vox, 40 fro AS 2024-08-18 15:05:29,629 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.620e+01 2.255e+01 2.650e+01 2.885e+01 7.175e+01, threshold=5.300e+01, percent-clipped=2.0 2024-08-18 15:05:38,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9000, loss[loss=0.1126, beats_loss=0.01006, ecapa_loss=0.0001149, whisper_loss=0.1014, over 23086.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01059, ecapa_loss=0.0001435, whisper_loss=0.09039, over 3857509.50 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:05:38,070 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 15:06:15,558 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005205, whisper_loss=0.2465, over 922467.00 frames. 2024-08-18 15:06:34,013 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on SV_voxceleb1: loss=0.004102, beats_loss=0, ecapa_loss=0.0004102, whisper_loss=0, over 939242.00 frames. 2024-08-18 15:08:24,127 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 15:08:24,131 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 15:08:29,989 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.44 vs. limit=22.5 2024-08-18 15:08:34,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=3956190.0, ans=0.2 2024-08-18 15:08:48,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3956390.0, ans=0.0 2024-08-18 15:08:48,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3956390.0, ans=0.0 2024-08-18 15:08:54,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2024-08-18 15:08:56,510 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 15:09:14,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3956590.0, ans=0.2 2024-08-18 15:09:26,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9050, loss[loss=0.09586, beats_loss=0.01205, ecapa_loss=0.0001455, whisper_loss=0.08235, over 22709.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001436, whisper_loss=0.09055, over 3851682.65 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:09:27,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3956690.0, ans=0.0 2024-08-18 15:09:31,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=12.0 2024-08-18 15:09:47,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-08-18 15:09:48,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3956790.0, ans=0.0 2024-08-18 15:09:51,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3956890.0, ans=0.0 2024-08-18 15:09:58,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-08-18 15:10:00,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3956890.0, ans=0.0 2024-08-18 15:10:07,668 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 15:10:19,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.267e+01 2.520e+01 2.853e+01 4.367e+01, threshold=5.041e+01, percent-clipped=0.0 2024-08-18 15:10:28,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9100, loss[loss=0.1074, beats_loss=0.008012, ecapa_loss=0.000148, whisper_loss=0.0979, over 16632.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001441, whisper_loss=0.09073, over 3821171.13 frames. ], batch size: 62, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:10:39,118 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 15:11:15,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-18 15:11:20,133 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 15:11:21,685 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3957590.0, ans=0.0 2024-08-18 15:11:23,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3957590.0, ans=0.125 2024-08-18 15:11:29,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3957590.0, ans=0.125 2024-08-18 15:11:30,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9150, loss[loss=0.1014, beats_loss=0.01063, ecapa_loss=0.0001223, whisper_loss=0.08957, over 20916.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01044, ecapa_loss=0.0001449, whisper_loss=0.09111, over 3854576.23 frames. ], batch size: 81, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:11:35,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3957690.0, ans=0.125 2024-08-18 15:11:35,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-08-18 15:11:36,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3957690.0, ans=0.0 2024-08-18 15:12:19,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=3957990.0, ans=0.125 2024-08-18 15:12:19,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3957990.0, ans=0.0 2024-08-18 15:12:22,716 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 15:12:24,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.708e+01 2.284e+01 2.490e+01 2.890e+01 6.008e+01, threshold=4.980e+01, percent-clipped=1.0 2024-08-18 15:12:33,867 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9200, loss[loss=0.1035, beats_loss=0.0122, ecapa_loss=0.0001057, whisper_loss=0.09022, over 23507.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01036, ecapa_loss=0.000147, whisper_loss=0.09142, over 3903972.90 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:12:55,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3958290.0, ans=0.0 2024-08-18 15:12:56,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-08-18 15:13:09,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-18 15:13:23,811 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 15:13:35,080 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9250, loss[loss=0.1151, beats_loss=0.01151, ecapa_loss=0.0001584, whisper_loss=0.102, over 22002.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001482, whisper_loss=0.09115, over 3892113.23 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:13:39,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3958690.0, ans=0.2 2024-08-18 15:13:53,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2024-08-18 15:14:00,639 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2024-08-18 15:14:06,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3958890.0, ans=0.0 2024-08-18 15:14:10,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.18 vs. limit=15.0 2024-08-18 15:14:13,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3958990.0, ans=0.1 2024-08-18 15:14:18,860 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 15:14:20,071 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 15:14:20,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=3958990.0, ans=0.2 2024-08-18 15:14:25,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3959090.0, ans=0.125 2024-08-18 15:14:27,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3959090.0, ans=0.05 2024-08-18 15:14:28,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.320e+01 2.629e+01 2.972e+01 5.900e+01, threshold=5.258e+01, percent-clipped=2.0 2024-08-18 15:14:31,152 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-18 15:14:37,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9300, loss[loss=0.08591, beats_loss=0.009967, ecapa_loss=0.0001887, whisper_loss=0.07406, over 16132.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001466, whisper_loss=0.09012, over 3886925.36 frames. ], batch size: 68, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:14:45,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3959190.0, ans=0.0 2024-08-18 15:14:56,995 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=22.5 2024-08-18 15:15:01,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3959390.0, ans=0.1 2024-08-18 15:15:06,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3959390.0, ans=0.1 2024-08-18 15:15:09,253 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 15:15:09,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3959390.0, ans=0.125 2024-08-18 15:15:39,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=3959590.0, ans=0.1 2024-08-18 15:15:41,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9350, loss[loss=0.07302, beats_loss=0.01185, ecapa_loss=0.0001339, whisper_loss=0.05983, over 18536.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001452, whisper_loss=0.09011, over 3872374.21 frames. ], batch size: 73, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:15:43,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3959690.0, ans=0.125 2024-08-18 15:15:48,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=3959690.0, ans=0.125 2024-08-18 15:15:49,153 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 15:15:58,770 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 17 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 15:16:12,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-18 15:16:19,044 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 15:16:26,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3959990.0, ans=0.125 2024-08-18 15:16:29,104 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 15:16:29,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3959990.0, ans=0.0 2024-08-18 15:16:37,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.319e+01 2.506e+01 2.733e+01 5.206e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-18 15:16:39,324 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 15:16:47,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9400, loss[loss=0.1151, beats_loss=0.0105, ecapa_loss=0.0001506, whisper_loss=0.1031, over 21406.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001459, whisper_loss=0.08971, over 3877273.41 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:16:52,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=3960190.0, ans=0.125 2024-08-18 15:16:53,280 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-18 15:16:53,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2024-08-18 15:16:56,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=3960190.0, ans=15.0 2024-08-18 15:17:10,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3960290.0, ans=0.0 2024-08-18 15:17:18,781 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 15:17:45,540 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3960590.0, ans=10.0 2024-08-18 15:17:52,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=3960690.0, ans=0.09899494936611666 2024-08-18 15:17:53,130 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9450, loss[loss=0.1208, beats_loss=0.008852, ecapa_loss=0.000165, whisper_loss=0.1103, over 21185.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01059, ecapa_loss=0.0001469, whisper_loss=0.0894, over 3853908.29 frames. ], batch size: 86, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:18:00,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3960690.0, ans=0.0 2024-08-18 15:18:03,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3960690.0, ans=0.125 2024-08-18 15:18:06,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3960790.0, ans=0.0 2024-08-18 15:18:12,068 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.09 vs. limit=22.5 2024-08-18 15:18:13,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3960790.0, ans=0.0 2024-08-18 15:18:23,834 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 15:18:24,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3960890.0, ans=0.0 2024-08-18 15:18:27,177 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-18 15:18:33,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=3960990.0, ans=0.0 2024-08-18 15:18:35,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3960990.0, ans=0.125 2024-08-18 15:18:35,149 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.72 vs. limit=22.5 2024-08-18 15:18:42,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3960990.0, ans=0.2 2024-08-18 15:18:50,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.723e+01 2.322e+01 2.578e+01 2.897e+01 2.615e+02, threshold=5.157e+01, percent-clipped=1.0 2024-08-18 15:18:54,221 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 15:18:58,710 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 15:18:59,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9500, loss[loss=0.1105, beats_loss=0.01128, ecapa_loss=0.0001311, whisper_loss=0.0979, over 23000.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001462, whisper_loss=0.08919, over 3871189.34 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:19:07,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3961190.0, ans=0.125 2024-08-18 15:19:19,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3961290.0, ans=0.125 2024-08-18 15:19:44,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3961490.0, ans=0.0 2024-08-18 15:20:11,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9550, loss[loss=0.1178, beats_loss=0.008823, ecapa_loss=0.0001459, whisper_loss=0.1075, over 13498.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001462, whisper_loss=0.08896, over 3858042.28 frames. ], batch size: 53, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:20:23,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-18 15:20:26,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3961790.0, ans=0.0 2024-08-18 15:20:29,995 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-18 15:20:35,560 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 15:20:41,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3961890.0, ans=0.0 2024-08-18 15:21:02,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=3961990.0, ans=0.0 2024-08-18 15:21:09,266 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 15:21:10,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.297e+01 2.564e+01 2.916e+01 8.592e+01, threshold=5.127e+01, percent-clipped=1.0 2024-08-18 15:21:18,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3962090.0, ans=0.1 2024-08-18 15:21:20,408 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9600, loss[loss=0.09985, beats_loss=0.01163, ecapa_loss=0.000123, whisper_loss=0.08698, over 22820.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01051, ecapa_loss=0.0001463, whisper_loss=0.08854, over 3835822.21 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:21:33,556 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 15:21:41,878 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:21:50,251 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2024-08-18 15:22:04,753 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-08-18 15:22:10,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3962490.0, ans=0.125 2024-08-18 15:22:18,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3962590.0, ans=0.1 2024-08-18 15:22:18,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3962590.0, ans=0.125 2024-08-18 15:22:21,069 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:22:28,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9650, loss[loss=0.09795, beats_loss=0.01257, ecapa_loss=0.0001186, whisper_loss=0.0842, over 20488.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001465, whisper_loss=0.08984, over 3832276.52 frames. ], batch size: 82, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:22:29,167 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 15:22:47,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3962790.0, ans=0.1 2024-08-18 15:23:02,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3962890.0, ans=0.125 2024-08-18 15:23:07,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=3962990.0, ans=0.125 2024-08-18 15:23:20,729 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.62 vs. limit=15.0 2024-08-18 15:23:22,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3963090.0, ans=0.125 2024-08-18 15:23:26,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.385e+01 2.565e+01 2.930e+01 2.026e+02, threshold=5.129e+01, percent-clipped=1.0 2024-08-18 15:23:33,220 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 15:23:37,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9700, loss[loss=0.1126, beats_loss=0.008802, ecapa_loss=0.000173, whisper_loss=0.1021, over 19485.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001481, whisper_loss=0.09056, over 3823228.77 frames. ], batch size: 80, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:23:56,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-08-18 15:24:12,258 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 15:24:22,069 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-18 15:24:29,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3963490.0, ans=0.09899494936611666 2024-08-18 15:24:50,581 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9750, loss[loss=0.08176, beats_loss=0.01195, ecapa_loss=0.0001248, whisper_loss=0.06856, over 19031.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001476, whisper_loss=0.09052, over 3855305.05 frames. ], batch size: 80, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:24:53,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-08-18 15:24:58,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3963690.0, ans=0.125 2024-08-18 15:25:10,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3963790.0, ans=0.125 2024-08-18 15:25:33,741 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-18 15:25:34,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3963990.0, ans=0.0 2024-08-18 15:25:35,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2024-08-18 15:25:44,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=3963990.0, ans=0.125 2024-08-18 15:25:51,336 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.207e+01 2.474e+01 2.731e+01 4.379e+01, threshold=4.949e+01, percent-clipped=0.0 2024-08-18 15:25:53,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964090.0, ans=0.1 2024-08-18 15:25:58,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=3964090.0, ans=0.125 2024-08-18 15:26:00,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9800, loss[loss=0.09841, beats_loss=0.01235, ecapa_loss=0.0001226, whisper_loss=0.08483, over 18166.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001472, whisper_loss=0.0904, over 3822353.94 frames. ], batch size: 72, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:26:18,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3964290.0, ans=0.0 2024-08-18 15:26:28,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=22.5 2024-08-18 15:26:31,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3964390.0, ans=0.1 2024-08-18 15:26:35,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3964390.0, ans=0.1 2024-08-18 15:26:51,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3964490.0, ans=0.125 2024-08-18 15:26:55,307 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 15:26:56,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-18 15:27:10,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9850, loss[loss=0.1084, beats_loss=0.01087, ecapa_loss=0.0001565, whisper_loss=0.096, over 13435.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001471, whisper_loss=0.09085, over 3813518.41 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:27:37,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3964890.0, ans=0.125 2024-08-18 15:27:38,492 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-18 15:28:09,195 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.298e+01 2.545e+01 2.787e+01 5.182e+01, threshold=5.091e+01, percent-clipped=2.0 2024-08-18 15:28:18,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9900, loss[loss=0.105, beats_loss=0.01057, ecapa_loss=0.0001482, whisper_loss=0.09298, over 23218.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01038, ecapa_loss=0.0001465, whisper_loss=0.09131, over 3876703.39 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:28:21,444 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-18 15:28:34,206 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 26 from Vox, 26 fro AS 2024-08-18 15:29:06,369 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 15:29:15,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3965590.0, ans=0.1 2024-08-18 15:29:25,005 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 9950, loss[loss=0.09486, beats_loss=0.01508, ecapa_loss=0.0001236, whisper_loss=0.07854, over 21346.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001464, whisper_loss=0.09046, over 3867665.60 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:29:28,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=3965690.0, ans=10.0 2024-08-18 15:29:32,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2024-08-18 15:30:09,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3965990.0, ans=0.125 2024-08-18 15:30:10,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=3965990.0, ans=0.0 2024-08-18 15:30:21,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.257e+01 2.444e+01 2.772e+01 4.436e+01, threshold=4.888e+01, percent-clipped=0.0 2024-08-18 15:30:30,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10000, loss[loss=0.1272, beats_loss=0.009084, ecapa_loss=0.0001707, whisper_loss=0.1164, over 23162.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01046, ecapa_loss=0.0001469, whisper_loss=0.09082, over 3853532.33 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:30:31,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3966190.0, ans=0.1 2024-08-18 15:30:50,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=15.0 2024-08-18 15:31:17,553 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.95 vs. limit=22.5 2024-08-18 15:31:20,749 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 14 from Vox, 47 fro AS 2024-08-18 15:31:29,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=3966590.0, ans=0.95 2024-08-18 15:31:36,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10050, loss[loss=0.1113, beats_loss=0.008304, ecapa_loss=0.0001466, whisper_loss=0.1015, over 15315.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001459, whisper_loss=0.09045, over 3861609.60 frames. ], batch size: 61, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:31:44,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3966690.0, ans=0.125 2024-08-18 15:32:01,596 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-18 15:32:05,238 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 15:32:16,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=3966990.0, ans=0.125 2024-08-18 15:32:35,668 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.258e+01 2.497e+01 2.785e+01 5.136e+01, threshold=4.994e+01, percent-clipped=1.0 2024-08-18 15:32:45,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10100, loss[loss=0.08303, beats_loss=0.01122, ecapa_loss=0.0001521, whisper_loss=0.07029, over 13165.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001454, whisper_loss=0.09097, over 3874271.68 frames. ], batch size: 55, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:32:48,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3967190.0, ans=0.0 2024-08-18 15:32:53,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3967190.0, ans=0.05 2024-08-18 15:32:56,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3967290.0, ans=0.0 2024-08-18 15:33:03,161 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 15:33:13,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3967390.0, ans=0.125 2024-08-18 15:33:14,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=3967390.0, ans=0.0 2024-08-18 15:33:50,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10150, loss[loss=0.1133, beats_loss=0.008743, ecapa_loss=0.0001381, whisper_loss=0.1032, over 17765.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01046, ecapa_loss=0.0001464, whisper_loss=0.09144, over 3882830.76 frames. ], batch size: 66, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:34:14,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3967790.0, ans=0.0 2024-08-18 15:34:16,665 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.15 vs. limit=22.5 2024-08-18 15:34:19,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3967890.0, ans=0.0 2024-08-18 15:34:23,665 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 15:34:26,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3967890.0, ans=0.0 2024-08-18 15:34:30,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=3967890.0, ans=0.2 2024-08-18 15:34:34,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3967990.0, ans=0.1 2024-08-18 15:34:38,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3967990.0, ans=0.125 2024-08-18 15:34:39,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3967990.0, ans=0.1 2024-08-18 15:34:43,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3967990.0, ans=0.0 2024-08-18 15:34:50,169 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.305e+01 2.555e+01 2.872e+01 4.370e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-18 15:34:59,493 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10200, loss[loss=0.1115, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.09969, over 19540.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01042, ecapa_loss=0.000146, whisper_loss=0.09162, over 3909992.16 frames. ], batch size: 78, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:35:20,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-18 15:35:42,342 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 15:35:47,318 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-18 15:36:04,960 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10250, loss[loss=0.1125, beats_loss=0.009777, ecapa_loss=0.0001345, whisper_loss=0.1014, over 22268.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001451, whisper_loss=0.09121, over 3909742.64 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:36:08,549 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.18 vs. limit=22.5 2024-08-18 15:36:14,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=3968690.0, ans=0.04949747468305833 2024-08-18 15:36:19,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=3968790.0, ans=0.0 2024-08-18 15:36:30,221 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 15:36:35,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3968890.0, ans=0.1 2024-08-18 15:37:01,342 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.987e+01 2.331e+01 2.519e+01 2.797e+01 4.005e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 15:37:02,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3969090.0, ans=0.125 2024-08-18 15:37:11,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10300, loss[loss=0.06838, beats_loss=0.01316, ecapa_loss=0.0001441, whisper_loss=0.05378, over 14513.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001447, whisper_loss=0.09079, over 3914534.62 frames. ], batch size: 59, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:37:11,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3969190.0, ans=0.2 2024-08-18 15:37:19,236 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-18 15:37:20,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=3969190.0, ans=0.07 2024-08-18 15:37:30,820 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 15:37:31,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=3969290.0, ans=0.125 2024-08-18 15:37:35,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3969290.0, ans=0.1 2024-08-18 15:37:40,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.86 vs. limit=10.0 2024-08-18 15:38:04,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=3969490.0, ans=0.2 2024-08-18 15:38:17,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3969590.0, ans=0.125 2024-08-18 15:38:19,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3969690.0, ans=0.2 2024-08-18 15:38:20,132 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10350, loss[loss=0.09308, beats_loss=0.01234, ecapa_loss=0.0001308, whisper_loss=0.07943, over 22557.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.000145, whisper_loss=0.09032, over 3932505.16 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:38:23,414 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-18 15:38:23,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3969690.0, ans=0.125 2024-08-18 15:38:33,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-18 15:38:59,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=3969890.0, ans=0.125 2024-08-18 15:39:10,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=3969990.0, ans=0.2 2024-08-18 15:39:18,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3970090.0, ans=0.1 2024-08-18 15:39:20,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.307e+01 2.571e+01 2.921e+01 7.269e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-18 15:39:30,278 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10400, loss[loss=0.0872, beats_loss=0.01081, ecapa_loss=0.0001424, whisper_loss=0.07496, over 13674.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001455, whisper_loss=0.09029, over 3901038.80 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:39:38,216 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 15:39:39,395 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 15:39:48,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=3970290.0, ans=0.2 2024-08-18 15:40:03,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3970390.0, ans=0.125 2024-08-18 15:40:31,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3970590.0, ans=0.125 2024-08-18 15:40:38,021 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10450, loss[loss=0.08567, beats_loss=0.0114, ecapa_loss=0.0001252, whisper_loss=0.07302, over 13634.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001446, whisper_loss=0.09028, over 3877495.76 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:40:45,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3970690.0, ans=0.0 2024-08-18 15:40:46,450 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 15:40:48,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-08-18 15:40:50,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3970790.0, ans=0.2 2024-08-18 15:41:02,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3970890.0, ans=0.0 2024-08-18 15:41:20,437 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 15:41:34,985 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.245e+01 2.442e+01 2.727e+01 4.233e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-18 15:41:39,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.15 vs. limit=22.5 2024-08-18 15:41:42,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3971090.0, ans=0.125 2024-08-18 15:41:42,918 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 15:41:44,293 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10500, loss[loss=0.1099, beats_loss=0.01053, ecapa_loss=0.000153, whisper_loss=0.09786, over 17227.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001453, whisper_loss=0.08974, over 3839201.10 frames. ], batch size: 69, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:41:45,633 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 15:41:59,453 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 15:42:02,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=3971290.0, ans=0.125 2024-08-18 15:42:50,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10550, loss[loss=0.1071, beats_loss=0.01089, ecapa_loss=0.0001312, whisper_loss=0.0949, over 23495.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001449, whisper_loss=0.09005, over 3849487.31 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:42:57,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.99 vs. limit=6.0 2024-08-18 15:43:09,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3971790.0, ans=0.125 2024-08-18 15:43:13,708 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 15:43:20,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3971890.0, ans=0.125 2024-08-18 15:43:35,506 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 15:43:36,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-18 15:43:42,673 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 15:43:45,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.358e+01 2.575e+01 2.904e+01 4.365e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-18 15:43:52,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=3972090.0, ans=0.0 2024-08-18 15:43:55,684 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10600, loss[loss=0.08642, beats_loss=0.009933, ecapa_loss=0.0001535, whisper_loss=0.07495, over 16177.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01042, ecapa_loss=0.0001449, whisper_loss=0.08996, over 3842198.88 frames. ], batch size: 66, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:44:02,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3972190.0, ans=0.1 2024-08-18 15:44:05,555 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2024-08-18 15:44:10,310 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 15:44:18,152 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 15:44:40,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=3972490.0, ans=0.0 2024-08-18 15:44:57,124 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 14 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 15:44:58,538 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 15:45:02,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10650, loss[loss=0.1242, beats_loss=0.006791, ecapa_loss=0.000197, whisper_loss=0.1155, over 18530.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001449, whisper_loss=0.0903, over 3834358.35 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:45:06,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.07 vs. limit=15.0 2024-08-18 15:45:08,504 WARNING [optim.py:496] (3/4) Scaling gradients by 0.027766374871134758, model_norm_threshold=51.50757598876953 2024-08-18 15:45:08,673 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.457e+05, grad_sumsq=1.339e+05, orig_rms_sq=3.328e+00 2024-08-18 15:45:22,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3972790.0, ans=0.1 2024-08-18 15:45:24,429 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 15:45:26,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.91 vs. limit=22.5 2024-08-18 15:45:27,387 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 15:45:34,451 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=3972890.0, ans=0.125 2024-08-18 15:45:36,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=3972890.0, ans=0.0 2024-08-18 15:46:01,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.916e+01 2.310e+01 2.515e+01 2.823e+01 1.855e+03, threshold=5.029e+01, percent-clipped=1.0 2024-08-18 15:46:03,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-18 15:46:09,818 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 15:46:10,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10700, loss[loss=0.09416, beats_loss=0.01179, ecapa_loss=0.0001357, whisper_loss=0.08101, over 21852.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001452, whisper_loss=0.0905, over 3823974.46 frames. ], batch size: 89, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:46:22,756 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=12.0 2024-08-18 15:46:33,258 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 15:46:43,632 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-18 15:46:53,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.24 vs. limit=22.5 2024-08-18 15:47:22,606 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10750, loss[loss=0.1209, beats_loss=0.00946, ecapa_loss=0.0001427, whisper_loss=0.11, over 20733.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001439, whisper_loss=0.09074, over 3853724.65 frames. ], batch size: 81, lr: 2.26e-03, grad_scale: 1.152921504606847e+18 2024-08-18 15:47:23,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-08-18 15:47:25,395 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 15:47:39,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=3973790.0, ans=0.09899494936611666 2024-08-18 15:47:40,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3973790.0, ans=0.0 2024-08-18 15:47:57,884 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 15:48:09,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.80 vs. limit=10.0 2024-08-18 15:48:13,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=3973990.0, ans=0.05 2024-08-18 15:48:27,964 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.366e+01 2.579e+01 2.828e+01 3.318e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 15:48:38,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10800, loss[loss=0.09091, beats_loss=0.01161, ecapa_loss=0.0001279, whisper_loss=0.07803, over 16827.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01039, ecapa_loss=0.0001446, whisper_loss=0.09144, over 3878518.69 frames. ], batch size: 64, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:48:41,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-08-18 15:48:51,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3974290.0, ans=0.125 2024-08-18 15:48:52,624 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 15:48:56,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3974290.0, ans=15.0 2024-08-18 15:48:57,632 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-18 15:49:10,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.51 vs. limit=22.5 2024-08-18 15:49:18,771 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 15:49:30,599 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-18 15:49:54,539 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10850, loss[loss=0.1067, beats_loss=0.01076, ecapa_loss=0.00014, whisper_loss=0.09454, over 20326.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001443, whisper_loss=0.09074, over 3874905.73 frames. ], batch size: 83, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:49:56,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3974690.0, ans=0.0 2024-08-18 15:49:59,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3974690.0, ans=0.1 2024-08-18 15:50:13,067 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 15:50:16,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3974790.0, ans=0.0 2024-08-18 15:50:17,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3974790.0, ans=0.125 2024-08-18 15:50:55,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=3975090.0, ans=0.125 2024-08-18 15:51:00,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3975090.0, ans=0.125 2024-08-18 15:51:01,347 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.964e+01 2.249e+01 2.453e+01 2.667e+01 3.964e+01, threshold=4.906e+01, percent-clipped=0.0 2024-08-18 15:51:04,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3975090.0, ans=0.125 2024-08-18 15:51:06,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=3975090.0, ans=0.2 2024-08-18 15:51:10,416 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10900, loss[loss=0.1094, beats_loss=0.007666, ecapa_loss=0.0001397, whisper_loss=0.1003, over 14447.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09102, over 3903738.73 frames. ], batch size: 54, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:51:14,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=3975190.0, ans=0.05 2024-08-18 15:51:18,310 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 15:51:18,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3975190.0, ans=0.0 2024-08-18 15:51:38,718 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 15:51:41,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.92 vs. limit=22.5 2024-08-18 15:51:58,374 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 15:51:58,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3975490.0, ans=0.125 2024-08-18 15:52:00,744 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.28 vs. limit=12.0 2024-08-18 15:52:01,167 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 15:52:06,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.31 vs. limit=15.0 2024-08-18 15:52:07,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3975490.0, ans=0.1 2024-08-18 15:52:13,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=3975590.0, ans=0.125 2024-08-18 15:52:27,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 10950, loss[loss=0.1112, beats_loss=0.01273, ecapa_loss=0.0001197, whisper_loss=0.09729, over 21738.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01041, ecapa_loss=0.0001442, whisper_loss=0.09114, over 3909930.06 frames. ], batch size: 88, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:52:32,309 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=12.0 2024-08-18 15:52:32,825 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 15:52:34,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3975690.0, ans=0.125 2024-08-18 15:52:35,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3975690.0, ans=0.125 2024-08-18 15:52:36,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-18 15:53:24,718 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 11 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 15:53:33,581 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.325e+01 2.540e+01 2.829e+01 5.122e+01, threshold=5.080e+01, percent-clipped=1.0 2024-08-18 15:53:40,312 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 15:53:43,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11000, loss[loss=0.09536, beats_loss=0.008672, ecapa_loss=0.0001327, whisper_loss=0.08536, over 16913.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.0001434, whisper_loss=0.091, over 3909747.38 frames. ], batch size: 63, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:53:53,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3976190.0, ans=0.125 2024-08-18 15:54:00,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=3976290.0, ans=0.0 2024-08-18 15:54:05,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=3976290.0, ans=0.2 2024-08-18 15:54:05,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.23 vs. limit=10.0 2024-08-18 15:54:47,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3976490.0, ans=0.0 2024-08-18 15:55:01,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-18 15:55:05,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11050, loss[loss=0.09473, beats_loss=0.009614, ecapa_loss=0.0001392, whisper_loss=0.08373, over 21529.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.09087, over 3911522.06 frames. ], batch size: 87, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:55:10,385 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 15:55:23,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3976790.0, ans=0.1 2024-08-18 15:55:24,048 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 15:55:26,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=12.0 2024-08-18 15:55:34,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-18 15:55:35,042 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 15:55:59,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3976990.0, ans=0.0 2024-08-18 15:56:12,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.815e+01 2.281e+01 2.516e+01 2.908e+01 1.267e+02, threshold=5.032e+01, percent-clipped=1.0 2024-08-18 15:56:14,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3977090.0, ans=0.125 2024-08-18 15:56:21,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11100, loss[loss=0.103, beats_loss=0.01166, ecapa_loss=0.0001381, whisper_loss=0.08999, over 18958.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001434, whisper_loss=0.09065, over 3922786.19 frames. ], batch size: 75, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:56:33,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.87 vs. limit=22.5 2024-08-18 15:56:57,616 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-18 15:57:19,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3977590.0, ans=0.125 2024-08-18 15:57:34,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3977690.0, ans=0.0 2024-08-18 15:57:35,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11150, loss[loss=0.1079, beats_loss=0.01013, ecapa_loss=0.0001378, whisper_loss=0.09636, over 19499.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01038, ecapa_loss=0.000143, whisper_loss=0.09137, over 3923196.84 frames. ], batch size: 76, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:57:35,469 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 15:57:44,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2024-08-18 15:57:45,608 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 38 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 15:58:08,308 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 15:58:08,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3977890.0, ans=0.0 2024-08-18 15:58:18,770 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-18 15:58:23,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3977990.0, ans=0.0 2024-08-18 15:58:23,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=12.0 2024-08-18 15:58:32,953 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 15:58:38,481 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.725e+01 2.330e+01 2.608e+01 2.859e+01 1.941e+02, threshold=5.216e+01, percent-clipped=1.0 2024-08-18 15:58:38,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3978090.0, ans=10.0 2024-08-18 15:58:44,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=3978090.0, ans=8.0 2024-08-18 15:58:47,720 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11200, loss[loss=0.1003, beats_loss=0.01104, ecapa_loss=0.0001474, whisper_loss=0.0878, over 22614.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01034, ecapa_loss=0.0001435, whisper_loss=0.09182, over 3921551.82 frames. ], batch size: 94, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 15:58:51,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3978190.0, ans=0.0 2024-08-18 15:58:56,093 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-18 15:59:09,036 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 15:59:13,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=3978290.0, ans=0.0 2024-08-18 15:59:42,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3978490.0, ans=0.125 2024-08-18 15:59:51,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=3978590.0, ans=0.0 2024-08-18 15:59:54,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=3978590.0, ans=0.125 2024-08-18 16:00:06,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11250, loss[loss=0.09635, beats_loss=0.009546, ecapa_loss=0.0001877, whisper_loss=0.08493, over 15924.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01031, ecapa_loss=0.0001448, whisper_loss=0.09197, over 3925069.16 frames. ], batch size: 68, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:00:09,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3978690.0, ans=0.125 2024-08-18 16:00:12,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3978690.0, ans=0.05 2024-08-18 16:00:16,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2024-08-18 16:00:27,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=3978790.0, ans=0.0 2024-08-18 16:00:27,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3978790.0, ans=0.0 2024-08-18 16:00:31,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2024-08-18 16:00:38,740 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 16:00:42,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.36 vs. limit=15.0 2024-08-18 16:00:44,534 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 16:00:47,426 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 16:00:55,666 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 16:01:09,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.90 vs. limit=15.0 2024-08-18 16:01:12,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.851e+01 2.408e+01 2.624e+01 3.093e+01 2.615e+02, threshold=5.248e+01, percent-clipped=2.0 2024-08-18 16:01:14,360 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 16:01:17,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=3979090.0, ans=0.2 2024-08-18 16:01:22,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11300, loss[loss=0.1221, beats_loss=0.008429, ecapa_loss=0.0001551, whisper_loss=0.1121, over 22666.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01032, ecapa_loss=0.0001439, whisper_loss=0.09206, over 3953704.80 frames. ], batch size: 90, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:01:39,546 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 16:01:42,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.97 vs. limit=15.0 2024-08-18 16:01:59,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3979390.0, ans=0.125 2024-08-18 16:02:03,447 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 16:02:12,186 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 16:02:17,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3979490.0, ans=0.1 2024-08-18 16:02:23,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3979590.0, ans=0.1 2024-08-18 16:02:38,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11350, loss[loss=0.07372, beats_loss=0.01235, ecapa_loss=0.0001218, whisper_loss=0.06015, over 17902.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01031, ecapa_loss=0.000144, whisper_loss=0.09164, over 3935670.12 frames. ], batch size: 74, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:02:39,592 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=8.0 2024-08-18 16:02:42,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3979690.0, ans=0.0 2024-08-18 16:02:45,210 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-18 16:03:01,071 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 16:03:09,083 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-18 16:03:18,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3979890.0, ans=0.125 2024-08-18 16:03:22,192 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 16:03:24,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3979990.0, ans=0.125 2024-08-18 16:03:32,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3979990.0, ans=0.125 2024-08-18 16:03:36,607 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 16:03:45,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3980090.0, ans=0.0 2024-08-18 16:03:46,648 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.287e+01 2.489e+01 2.829e+01 3.988e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-18 16:03:48,688 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 16:03:55,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11400, loss[loss=0.08525, beats_loss=0.01182, ecapa_loss=0.0001639, whisper_loss=0.07179, over 21897.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01031, ecapa_loss=0.0001445, whisper_loss=0.09136, over 3906604.27 frames. ], batch size: 93, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:03:57,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=3980190.0, ans=0.2 2024-08-18 16:04:04,178 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 16:04:10,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-18 16:04:11,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=3980290.0, ans=10.0 2024-08-18 16:04:12,652 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-18 16:04:37,957 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 16:04:47,197 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 16:04:49,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3980490.0, ans=0.125 2024-08-18 16:04:53,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=3980490.0, ans=0.1 2024-08-18 16:04:58,924 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 16:04:59,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3980590.0, ans=0.125 2024-08-18 16:05:13,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11450, loss[loss=0.09258, beats_loss=0.01271, ecapa_loss=0.0001401, whisper_loss=0.07847, over 14384.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01033, ecapa_loss=0.0001457, whisper_loss=0.0912, over 3882090.36 frames. ], batch size: 56, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:05:29,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3980790.0, ans=0.125 2024-08-18 16:05:43,775 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:05:48,320 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 16:06:21,541 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 22 from LS+wenet, 30 from Vox, 34 fro AS 2024-08-18 16:06:23,128 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-18 16:06:27,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.331e+01 2.551e+01 2.848e+01 4.379e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-18 16:06:29,473 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 16:06:36,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3981190.0, ans=0.0 2024-08-18 16:06:36,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3981190.0, ans=0.2 2024-08-18 16:06:37,184 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11500, loss[loss=0.1047, beats_loss=0.009463, ecapa_loss=0.0001402, whisper_loss=0.09383, over 19081.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01035, ecapa_loss=0.000145, whisper_loss=0.09109, over 3869980.10 frames. ], batch size: 77, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:06:49,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.42 vs. limit=15.0 2024-08-18 16:06:52,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3981290.0, ans=0.125 2024-08-18 16:06:55,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3981290.0, ans=0.125 2024-08-18 16:07:16,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3981390.0, ans=0.0 2024-08-18 16:07:25,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3981390.0, ans=0.0 2024-08-18 16:07:49,899 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-18 16:08:05,526 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 16:08:15,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3981590.0, ans=0.125 2024-08-18 16:08:18,256 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11550, loss[loss=0.09462, beats_loss=0.01083, ecapa_loss=0.0001445, whisper_loss=0.08234, over 22747.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001449, whisper_loss=0.09072, over 3889542.07 frames. ], batch size: 91, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:08:20,102 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 16:08:26,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3981690.0, ans=0.1 2024-08-18 16:08:35,834 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-18 16:08:38,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3981790.0, ans=0.125 2024-08-18 16:09:05,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=3981890.0, ans=10.0 2024-08-18 16:09:20,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=3981990.0, ans=0.125 2024-08-18 16:09:23,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=3981990.0, ans=15.0 2024-08-18 16:09:23,900 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-18 16:09:43,836 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=12.0 2024-08-18 16:09:48,205 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 28 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-18 16:09:53,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.363e+01 2.578e+01 2.890e+01 4.329e+01, threshold=5.155e+01, percent-clipped=0.0 2024-08-18 16:09:54,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3982090.0, ans=0.125 2024-08-18 16:09:57,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3982090.0, ans=0.125 2024-08-18 16:10:07,639 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.276e-02 2024-08-18 16:10:08,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11600, loss[loss=0.106, beats_loss=0.01088, ecapa_loss=0.0001303, whisper_loss=0.09385, over 17883.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001447, whisper_loss=0.09074, over 3869910.68 frames. ], batch size: 68, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:10:09,875 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 16:10:12,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.53 vs. limit=22.5 2024-08-18 16:10:19,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=3982190.0, ans=0.125 2024-08-18 16:10:29,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=3982190.0, ans=0.125 2024-08-18 16:10:29,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=3982190.0, ans=15.0 2024-08-18 16:10:59,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3982390.0, ans=0.0 2024-08-18 16:11:03,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3982390.0, ans=0.125 2024-08-18 16:11:05,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=3982390.0, ans=0.125 2024-08-18 16:11:18,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=3982390.0, ans=0.025 2024-08-18 16:11:32,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-18 16:11:40,380 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 16:11:46,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=3982590.0, ans=0.07 2024-08-18 16:11:46,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.64 vs. limit=10.0 2024-08-18 16:12:10,427 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11650, loss[loss=0.1006, beats_loss=0.01017, ecapa_loss=0.000138, whisper_loss=0.08901, over 19585.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001445, whisper_loss=0.08997, over 3888458.13 frames. ], batch size: 78, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:12:16,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=3982690.0, ans=0.2 2024-08-18 16:12:16,931 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 16:12:19,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=3982690.0, ans=0.125 2024-08-18 16:12:25,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=3982690.0, ans=0.2 2024-08-18 16:12:25,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3982690.0, ans=0.1 2024-08-18 16:12:59,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3982890.0, ans=0.2 2024-08-18 16:13:03,921 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 16:13:38,236 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2024-08-18 16:13:40,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.90 vs. limit=15.0 2024-08-18 16:13:51,004 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.304e+01 2.600e+01 2.905e+01 3.001e+02, threshold=5.199e+01, percent-clipped=1.0 2024-08-18 16:14:00,461 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3983090.0, ans=0.125 2024-08-18 16:14:04,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11700, loss[loss=0.1079, beats_loss=0.01142, ecapa_loss=0.0001123, whisper_loss=0.09534, over 23553.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001437, whisper_loss=0.09103, over 3912851.91 frames. ], batch size: 92, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:14:05,164 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 30 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-18 16:14:06,523 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 16:14:26,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3983290.0, ans=0.1 2024-08-18 16:14:44,821 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-18 16:14:49,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3983390.0, ans=0.0 2024-08-18 16:15:06,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3983490.0, ans=0.0 2024-08-18 16:15:18,473 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 16:15:30,001 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11750, loss[loss=0.1256, beats_loss=0.007511, ecapa_loss=0.0001438, whisper_loss=0.1167, over 15958.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01056, ecapa_loss=0.0001434, whisper_loss=0.09107, over 3921870.52 frames. ], batch size: 58, lr: 2.26e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:15:31,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3983690.0, ans=0.0 2024-08-18 16:15:51,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=3983790.0, ans=0.035 2024-08-18 16:16:09,226 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 16:16:11,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=3983890.0, ans=0.5 2024-08-18 16:16:39,130 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.396e+01 2.657e+01 3.044e+01 4.817e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-18 16:16:48,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11800, loss[loss=0.1032, beats_loss=0.01021, ecapa_loss=0.000144, whisper_loss=0.09155, over 22800.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01059, ecapa_loss=0.0001432, whisper_loss=0.09123, over 3936714.00 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:17:16,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3984290.0, ans=0.0 2024-08-18 16:17:27,917 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 16:17:29,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=3984390.0, ans=0.015 2024-08-18 16:17:32,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=3984390.0, ans=0.0 2024-08-18 16:17:37,007 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 16:17:39,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=3984490.0, ans=0.2 2024-08-18 16:17:59,944 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 16:18:03,926 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 16:18:09,386 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11850, loss[loss=0.09665, beats_loss=0.01168, ecapa_loss=0.0001396, whisper_loss=0.08357, over 22519.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01053, ecapa_loss=0.0001436, whisper_loss=0.0912, over 3926257.38 frames. ], batch size: 93, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:18:12,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=3984690.0, ans=0.125 2024-08-18 16:18:26,504 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-18 16:18:32,967 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 16:18:33,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=3984790.0, ans=0.125 2024-08-18 16:18:35,020 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-18 16:18:44,551 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 16:18:46,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=3984890.0, ans=0.0 2024-08-18 16:18:49,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-08-18 16:18:54,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3984890.0, ans=0.1 2024-08-18 16:19:01,386 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 16:19:03,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3984990.0, ans=0.1 2024-08-18 16:19:17,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.255e+01 2.535e+01 2.793e+01 4.854e+01, threshold=5.071e+01, percent-clipped=0.0 2024-08-18 16:19:21,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3985090.0, ans=0.0 2024-08-18 16:19:24,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=3985090.0, ans=0.125 2024-08-18 16:19:25,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3985190.0, ans=0.0 2024-08-18 16:19:26,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11900, loss[loss=0.1137, beats_loss=0.01089, ecapa_loss=0.0001178, whisper_loss=0.1017, over 19700.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0106, ecapa_loss=0.0001425, whisper_loss=0.09115, over 3940992.61 frames. ], batch size: 77, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:20:04,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=3985390.0, ans=0.0 2024-08-18 16:20:41,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=3985690.0, ans=0.125 2024-08-18 16:20:42,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 11950, loss[loss=0.103, beats_loss=0.01085, ecapa_loss=0.000147, whisper_loss=0.09065, over 18178.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001438, whisper_loss=0.09097, over 3885443.21 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:20:45,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2024-08-18 16:21:16,324 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 16:21:17,737 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 28 from Vox, 43 fro AS 2024-08-18 16:21:19,674 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 16:21:40,320 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.997e+05 2024-08-18 16:21:45,774 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:21:48,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.62 vs. limit=10.0 2024-08-18 16:21:54,292 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.727e+01 2.273e+01 2.566e+01 2.844e+01 1.117e+02, threshold=5.132e+01, percent-clipped=1.0 2024-08-18 16:22:02,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=3986190.0, ans=0.2 2024-08-18 16:22:03,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12000, loss[loss=0.09196, beats_loss=0.01138, ecapa_loss=0.000129, whisper_loss=0.07929, over 21979.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001427, whisper_loss=0.08973, over 3879645.88 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:22:03,596 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 16:22:37,172 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005101, whisper_loss=0.2481, over 922467.00 frames. 2024-08-18 16:22:55,550 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on SV_voxceleb1: loss=0.004067, beats_loss=0, ecapa_loss=0.0004067, whisper_loss=0, over 939242.00 frames. 2024-08-18 16:24:09,580 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.7436, 2.2448, 2.1940, 1.9429], device='cuda:3') 2024-08-18 16:24:34,660 INFO [train_multi_KD3.py:1149] (3/4) Epoch 27, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 16:24:34,663 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 16:24:45,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3986190.0, ans=0.125 2024-08-18 16:24:47,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=15.0 2024-08-18 16:24:48,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=3986190.0, ans=0.125 2024-08-18 16:25:04,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2024-08-18 16:25:42,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-18 16:25:49,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3986590.0, ans=0.2 2024-08-18 16:25:52,574 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12050, loss[loss=0.09736, beats_loss=0.008449, ecapa_loss=0.000154, whisper_loss=0.08737, over 18052.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01063, ecapa_loss=0.0001428, whisper_loss=0.09051, over 3891492.53 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:26:02,824 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 16:26:30,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3986890.0, ans=0.0 2024-08-18 16:26:31,304 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 16:26:38,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=3986990.0, ans=0.125 2024-08-18 16:26:41,054 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 39 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 16:26:49,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3986990.0, ans=0.125 2024-08-18 16:26:56,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=3987090.0, ans=0.0 2024-08-18 16:26:56,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=3987090.0, ans=0.125 2024-08-18 16:26:58,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3987090.0, ans=0.2 2024-08-18 16:27:02,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.347e+01 2.564e+01 2.841e+01 2.951e+02, threshold=5.127e+01, percent-clipped=2.0 2024-08-18 16:27:10,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=3987190.0, ans=0.025 2024-08-18 16:27:11,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12100, loss[loss=0.06673, beats_loss=0.01561, ecapa_loss=0.000118, whisper_loss=0.04993, over 19616.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01063, ecapa_loss=0.0001431, whisper_loss=0.08985, over 3879208.33 frames. ], batch size: 84, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:27:14,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3987190.0, ans=0.125 2024-08-18 16:27:14,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3987190.0, ans=0.125 2024-08-18 16:27:16,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=3987190.0, ans=0.0 2024-08-18 16:27:37,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=3987290.0, ans=0.2 2024-08-18 16:27:54,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3987390.0, ans=0.125 2024-08-18 16:28:01,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3987490.0, ans=0.1 2024-08-18 16:28:15,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3987590.0, ans=0.125 2024-08-18 16:28:29,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12150, loss[loss=0.07561, beats_loss=0.009813, ecapa_loss=0.0002314, whisper_loss=0.06348, over 14266.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.000144, whisper_loss=0.08918, over 3866847.13 frames. ], batch size: 66, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:28:39,261 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 16:28:40,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=3987690.0, ans=0.0 2024-08-18 16:28:52,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=3987790.0, ans=0.2 2024-08-18 16:29:14,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=3987890.0, ans=0.0 2024-08-18 16:29:19,481 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.88 vs. limit=10.0 2024-08-18 16:29:22,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=3987990.0, ans=0.0 2024-08-18 16:29:39,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.809e+01 2.283e+01 2.537e+01 2.740e+01 4.505e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 16:29:47,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12200, loss[loss=0.09777, beats_loss=0.01178, ecapa_loss=0.000125, whisper_loss=0.08473, over 22644.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.0001436, whisper_loss=0.08916, over 3888004.10 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:30:03,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3988290.0, ans=0.125 2024-08-18 16:30:07,741 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:30:21,205 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 16:30:24,342 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.667e+00 2024-08-18 16:30:30,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3988490.0, ans=0.0 2024-08-18 16:31:00,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12250, loss[loss=0.09329, beats_loss=0.01056, ecapa_loss=0.0001444, whisper_loss=0.08129, over 15792.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001444, whisper_loss=0.08954, over 3859189.58 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:31:22,245 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 16:31:55,022 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 16:32:05,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3989090.0, ans=0.2 2024-08-18 16:32:07,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.312e+01 2.540e+01 2.795e+01 3.669e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-18 16:32:17,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12300, loss[loss=0.0903, beats_loss=0.009276, ecapa_loss=0.0001386, whisper_loss=0.07963, over 16870.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001438, whisper_loss=0.0897, over 3898300.57 frames. ], batch size: 63, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:32:23,011 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=22.5 2024-08-18 16:32:30,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=3989190.0, ans=0.0 2024-08-18 16:32:36,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=12.0 2024-08-18 16:32:46,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=3989390.0, ans=0.125 2024-08-18 16:32:51,595 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 16:33:05,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3989490.0, ans=0.0 2024-08-18 16:33:26,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3989590.0, ans=0.0 2024-08-18 16:33:28,671 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-18 16:33:29,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12350, loss[loss=0.1023, beats_loss=0.01189, ecapa_loss=0.0001429, whisper_loss=0.08894, over 23182.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001447, whisper_loss=0.08964, over 3893914.76 frames. ], batch size: 94, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:33:36,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=3989690.0, ans=0.0 2024-08-18 16:33:39,396 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-18 16:33:39,577 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3989690.0, ans=0.125 2024-08-18 16:33:42,232 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 16:34:00,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-08-18 16:34:03,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=3989890.0, ans=0.0 2024-08-18 16:34:04,241 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 16:34:09,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3989890.0, ans=0.1 2024-08-18 16:34:30,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=3990090.0, ans=0.02 2024-08-18 16:34:32,237 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 16:34:37,198 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=8.0 2024-08-18 16:34:37,310 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.315e+01 2.645e+01 2.969e+01 4.976e+01, threshold=5.289e+01, percent-clipped=0.0 2024-08-18 16:34:39,301 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 16:34:41,948 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 16:34:45,900 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12400, loss[loss=0.0954, beats_loss=0.007326, ecapa_loss=0.00014, whisper_loss=0.08667, over 17695.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001436, whisper_loss=0.08976, over 3917508.36 frames. ], batch size: 68, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:34:54,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3990190.0, ans=0.0 2024-08-18 16:35:00,741 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 16:35:19,849 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 18 from LS+wenet, 21 from Vox, 56 fro AS 2024-08-18 16:35:40,715 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 44 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-18 16:35:55,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2024-08-18 16:35:56,180 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12450, loss[loss=0.1143, beats_loss=0.01093, ecapa_loss=0.0001357, whisper_loss=0.102, over 22916.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01046, ecapa_loss=0.0001437, whisper_loss=0.08978, over 3912997.82 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:36:15,107 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 16:36:16,528 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2024-08-18 16:36:18,591 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 16:36:19,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=3990790.0, ans=0.0 2024-08-18 16:36:30,469 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 16:36:38,019 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:36:58,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=3991090.0, ans=0.0 2024-08-18 16:37:00,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3991090.0, ans=0.05 2024-08-18 16:37:01,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.300e+01 2.585e+01 2.837e+01 4.531e+01, threshold=5.170e+01, percent-clipped=0.0 2024-08-18 16:37:10,414 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12500, loss[loss=0.1058, beats_loss=0.009894, ecapa_loss=0.0001074, whisper_loss=0.09486, over 18022.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001431, whisper_loss=0.08922, over 3889894.22 frames. ], batch size: 67, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:37:10,581 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 16:37:15,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3991190.0, ans=0.125 2024-08-18 16:37:16,365 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 16:37:41,189 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 16:37:59,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3991490.0, ans=0.0 2024-08-18 16:38:15,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3991590.0, ans=0.1 2024-08-18 16:38:23,443 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12550, loss[loss=0.09839, beats_loss=0.01097, ecapa_loss=0.0001373, whisper_loss=0.08605, over 21836.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.08935, over 3911849.81 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:38:50,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3991890.0, ans=0.0 2024-08-18 16:38:52,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=22.5 2024-08-18 16:38:53,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3991890.0, ans=0.0 2024-08-18 16:39:03,892 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-18 16:39:08,159 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 16:39:11,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3991990.0, ans=0.125 2024-08-18 16:39:25,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.342e+01 2.582e+01 2.915e+01 4.894e+01, threshold=5.164e+01, percent-clipped=0.0 2024-08-18 16:39:33,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12600, loss[loss=0.1045, beats_loss=0.01125, ecapa_loss=0.0001491, whisper_loss=0.09172, over 22230.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001443, whisper_loss=0.09011, over 3900482.59 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:39:37,812 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 16:39:39,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3992190.0, ans=0.125 2024-08-18 16:39:58,739 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-18 16:40:00,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3992390.0, ans=0.0 2024-08-18 16:40:06,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3992390.0, ans=0.125 2024-08-18 16:40:06,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3992390.0, ans=0.025 2024-08-18 16:40:34,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2024-08-18 16:40:42,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2024-08-18 16:40:43,887 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12650, loss[loss=0.08064, beats_loss=0.01452, ecapa_loss=9.932e-05, whisper_loss=0.06513, over 17587.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001433, whisper_loss=0.08972, over 3879011.29 frames. ], batch size: 67, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:40:45,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3992690.0, ans=0.1 2024-08-18 16:40:55,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=3992690.0, ans=0.125 2024-08-18 16:41:06,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.87 vs. limit=15.0 2024-08-18 16:41:10,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3992790.0, ans=0.0 2024-08-18 16:41:11,055 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2024-08-18 16:41:25,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3992990.0, ans=0.125 2024-08-18 16:41:45,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.296e+01 2.534e+01 2.850e+01 7.549e+01, threshold=5.068e+01, percent-clipped=1.0 2024-08-18 16:41:54,032 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12700, loss[loss=0.07999, beats_loss=0.01327, ecapa_loss=0.0001632, whisper_loss=0.06509, over 16841.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01059, ecapa_loss=0.0001433, whisper_loss=0.08981, over 3883698.65 frames. ], batch size: 72, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:42:22,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3993390.0, ans=0.125 2024-08-18 16:42:29,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=3993390.0, ans=0.125 2024-08-18 16:42:31,588 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 16:42:36,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2024-08-18 16:42:45,645 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 16:42:47,146 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 16:42:49,558 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 16:42:52,403 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 16:43:01,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=3993590.0, ans=0.0 2024-08-18 16:43:03,353 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 16:43:06,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12750, loss[loss=0.1117, beats_loss=0.011, ecapa_loss=0.0001574, whisper_loss=0.09916, over 22522.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01056, ecapa_loss=0.0001436, whisper_loss=0.09041, over 3891971.76 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:43:08,971 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 16:43:09,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=3993690.0, ans=0.05 2024-08-18 16:43:27,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-18 16:43:29,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3993790.0, ans=0.0 2024-08-18 16:43:31,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2024-08-18 16:43:33,571 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 16:43:34,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3993890.0, ans=0.125 2024-08-18 16:43:38,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.03 vs. limit=15.0 2024-08-18 16:43:44,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3993890.0, ans=0.125 2024-08-18 16:43:46,625 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 16:44:01,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=3993990.0, ans=0.2 2024-08-18 16:44:09,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.299e+01 2.591e+01 2.848e+01 5.343e+01, threshold=5.182e+01, percent-clipped=2.0 2024-08-18 16:44:10,232 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 16:44:18,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12800, loss[loss=0.07034, beats_loss=0.01251, ecapa_loss=0.0001358, whisper_loss=0.05648, over 19371.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001445, whisper_loss=0.08991, over 3910472.71 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:44:20,022 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 16:44:21,490 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 16:44:29,654 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-18 16:44:31,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=3994290.0, ans=0.0 2024-08-18 16:44:56,786 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 16:45:00,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3994490.0, ans=0.0 2024-08-18 16:45:03,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=3994490.0, ans=0.0 2024-08-18 16:45:15,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3994590.0, ans=0.0 2024-08-18 16:45:27,115 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12850, loss[loss=0.1006, beats_loss=0.01107, ecapa_loss=0.0001151, whisper_loss=0.0884, over 17874.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001453, whisper_loss=0.08932, over 3849506.76 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:45:29,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3994690.0, ans=0.125 2024-08-18 16:45:42,073 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 17 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-18 16:45:42,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3994790.0, ans=0.125 2024-08-18 16:45:56,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=3994890.0, ans=0.125 2024-08-18 16:46:09,271 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 17 from Vox, 51 fro AS 2024-08-18 16:46:11,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.41 vs. limit=15.0 2024-08-18 16:46:24,454 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 16:46:26,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3995090.0, ans=0.1 2024-08-18 16:46:27,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.211e+01 2.399e+01 2.704e+01 4.332e+02, threshold=4.798e+01, percent-clipped=1.0 2024-08-18 16:46:36,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12900, loss[loss=0.1169, beats_loss=0.008784, ecapa_loss=0.0001537, whisper_loss=0.1065, over 20552.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01061, ecapa_loss=0.0001452, whisper_loss=0.08903, over 3845254.15 frames. ], batch size: 80, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:46:41,903 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 16:47:00,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3995290.0, ans=0.0 2024-08-18 16:47:16,680 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 16:47:47,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=3995690.0, ans=0.2 2024-08-18 16:47:47,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 12950, loss[loss=0.1002, beats_loss=0.01072, ecapa_loss=0.0001266, whisper_loss=0.0882, over 16282.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01057, ecapa_loss=0.0001443, whisper_loss=0.08916, over 3835132.86 frames. ], batch size: 65, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:47:51,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.42 vs. limit=15.0 2024-08-18 16:47:52,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=3995690.0, ans=0.125 2024-08-18 16:47:54,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3995690.0, ans=0.1 2024-08-18 16:47:56,281 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 16:48:02,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3995790.0, ans=0.2 2024-08-18 16:48:20,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=3995890.0, ans=0.0 2024-08-18 16:48:28,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=3995890.0, ans=0.0 2024-08-18 16:48:29,018 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 16:48:44,656 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-18 16:48:48,465 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.384e+01 2.658e+01 2.947e+01 5.232e+01, threshold=5.316e+01, percent-clipped=1.0 2024-08-18 16:48:50,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=3996090.0, ans=0.07 2024-08-18 16:48:54,651 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 16:48:57,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13000, loss[loss=0.1065, beats_loss=0.008163, ecapa_loss=0.000161, whisper_loss=0.09672, over 22620.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001446, whisper_loss=0.09007, over 3843053.93 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:49:04,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=3996190.0, ans=0.0 2024-08-18 16:49:17,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3996290.0, ans=0.125 2024-08-18 16:49:18,298 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 16:49:25,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=3996390.0, ans=0.0 2024-08-18 16:49:25,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.70 vs. limit=22.5 2024-08-18 16:49:53,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3996590.0, ans=0.5 2024-08-18 16:49:59,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=3996590.0, ans=22.5 2024-08-18 16:50:06,165 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13050, loss[loss=0.09341, beats_loss=0.01156, ecapa_loss=0.0001267, whisper_loss=0.08059, over 22013.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001448, whisper_loss=0.08996, over 3870156.63 frames. ], batch size: 91, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:50:07,950 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 40 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 16:50:29,287 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 16:50:33,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=3996890.0, ans=0.125 2024-08-18 16:50:44,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=3996890.0, ans=0.125 2024-08-18 16:50:50,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3996990.0, ans=0.5 2024-08-18 16:50:57,091 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 16:50:59,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=3996990.0, ans=0.0 2024-08-18 16:50:59,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2024-08-18 16:51:06,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.279e+01 2.532e+01 2.847e+01 6.978e+01, threshold=5.064e+01, percent-clipped=1.0 2024-08-18 16:51:09,029 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-08-18 16:51:10,903 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 16:51:12,438 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 16:51:12,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=3997090.0, ans=0.0 2024-08-18 16:51:14,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13100, loss[loss=0.1223, beats_loss=0.008238, ecapa_loss=0.000148, whisper_loss=0.1126, over 23191.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001451, whisper_loss=0.09, over 3860887.68 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:51:38,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2024-08-18 16:51:58,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=3997490.0, ans=0.2 2024-08-18 16:52:00,840 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-18 16:52:04,916 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 25 from LS+wenet, 27 from Vox, 45 fro AS 2024-08-18 16:52:06,378 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 16:52:23,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=3997590.0, ans=0.2 2024-08-18 16:52:25,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13150, loss[loss=0.1138, beats_loss=0.008919, ecapa_loss=0.0001476, whisper_loss=0.1034, over 17320.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001438, whisper_loss=0.09006, over 3856446.01 frames. ], batch size: 66, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:52:29,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3997690.0, ans=0.1 2024-08-18 16:52:42,782 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 16:52:50,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3997790.0, ans=0.125 2024-08-18 16:52:52,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3997890.0, ans=0.125 2024-08-18 16:53:25,119 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.700e+01 2.321e+01 2.625e+01 2.876e+01 3.803e+01, threshold=5.250e+01, percent-clipped=0.0 2024-08-18 16:53:33,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13200, loss[loss=0.07746, beats_loss=0.01327, ecapa_loss=0.0001415, whisper_loss=0.06277, over 14812.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001427, whisper_loss=0.09021, over 3833859.33 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:53:38,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=3998190.0, ans=0.2 2024-08-18 16:53:54,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=3998290.0, ans=0.125 2024-08-18 16:53:55,030 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 16:54:04,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-18 16:54:19,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=3998490.0, ans=0.125 2024-08-18 16:54:38,187 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 16:54:39,489 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13250, loss[loss=0.1057, beats_loss=0.01006, ecapa_loss=0.0001801, whisper_loss=0.09381, over 22178.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001439, whisper_loss=0.09053, over 3851187.62 frames. ], batch size: 93, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:54:40,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2024-08-18 16:54:40,372 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=14.54 vs. limit=15.0 2024-08-18 16:54:55,472 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 16:54:55,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=3998790.0, ans=0.1 2024-08-18 16:55:40,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.403e+01 2.676e+01 3.041e+01 3.478e+02, threshold=5.351e+01, percent-clipped=3.0 2024-08-18 16:55:40,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=3999090.0, ans=0.0 2024-08-18 16:55:48,181 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13300, loss[loss=0.121, beats_loss=0.009402, ecapa_loss=0.0001707, whisper_loss=0.1099, over 20921.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01044, ecapa_loss=0.0001447, whisper_loss=0.09032, over 3841511.65 frames. ], batch size: 85, lr: 2.25e-03, grad_scale: 1.152921504606847e+18 2024-08-18 16:55:54,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=3999190.0, ans=0.125 2024-08-18 16:55:55,203 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.66 vs. limit=22.5 2024-08-18 16:55:56,816 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 23 from LS+wenet, 25 from Vox, 46 fro AS 2024-08-18 16:56:05,481 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 16:56:15,098 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 20 from Vox, 16 fro AS 2024-08-18 16:56:17,520 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 16:56:35,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3999490.0, ans=0.1 2024-08-18 16:56:35,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=3999490.0, ans=0.125 2024-08-18 16:56:46,399 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 16:56:52,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.37 vs. limit=15.0 2024-08-18 16:56:52,945 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13350, loss[loss=0.1018, beats_loss=0.01071, ecapa_loss=0.0001238, whisper_loss=0.08987, over 23856.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.09096, over 3874011.89 frames. ], batch size: 95, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:57:29,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=3999890.0, ans=0.09899494936611666 2024-08-18 16:57:30,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3999890.0, ans=0.2 2024-08-18 16:57:37,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=3999990.0, ans=0.0 2024-08-18 16:57:38,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=3999990.0, ans=0.025 2024-08-18 16:57:38,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=3999990.0, ans=0.125 2024-08-18 16:57:40,881 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-18 16:57:47,572 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 16:57:50,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4000090.0, ans=0.125 2024-08-18 16:57:56,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.674e+01 2.330e+01 2.620e+01 2.943e+01 5.095e+01, threshold=5.239e+01, percent-clipped=0.0 2024-08-18 16:58:00,711 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 16:58:01,976 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 16:58:03,315 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13400, loss[loss=0.1042, beats_loss=0.01032, ecapa_loss=0.0001366, whisper_loss=0.09247, over 22229.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01045, ecapa_loss=0.0001444, whisper_loss=0.09039, over 3866947.53 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:58:04,739 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 20 from LS+wenet, 31 from Vox, 44 fro AS 2024-08-18 16:58:13,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4000190.0, ans=0.05 2024-08-18 16:58:26,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2024-08-18 16:58:36,393 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 14 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 16:58:39,292 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 16:58:39,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4000390.0, ans=0.125 2024-08-18 16:58:42,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4000390.0, ans=0.0 2024-08-18 16:59:00,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4000590.0, ans=0.125 2024-08-18 16:59:01,917 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 16:59:15,441 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13450, loss[loss=0.08827, beats_loss=0.008809, ecapa_loss=0.0002257, whisper_loss=0.07721, over 12285.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001455, whisper_loss=0.09064, over 3884473.76 frames. ], batch size: 54, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 16:59:16,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4000690.0, ans=0.1 2024-08-18 16:59:28,190 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-18 16:59:52,918 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 17:00:11,665 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 17:00:23,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4001090.0, ans=0.125 2024-08-18 17:00:29,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.614e+01 2.273e+01 2.579e+01 2.870e+01 2.046e+02, threshold=5.158e+01, percent-clipped=2.0 2024-08-18 17:00:36,120 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13500, loss[loss=0.111, beats_loss=0.008806, ecapa_loss=0.0001519, whisper_loss=0.1006, over 18478.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001465, whisper_loss=0.09012, over 3868819.54 frames. ], batch size: 71, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:00:43,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-18 17:01:15,519 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 17:01:23,370 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 17:01:25,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4001490.0, ans=0.025 2024-08-18 17:01:30,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4001490.0, ans=0.125 2024-08-18 17:01:54,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13550, loss[loss=0.09887, beats_loss=0.01053, ecapa_loss=0.000143, whisper_loss=0.08691, over 21042.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001456, whisper_loss=0.08982, over 3848482.07 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:01:55,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4001690.0, ans=10.0 2024-08-18 17:02:15,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4001790.0, ans=0.125 2024-08-18 17:02:28,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4001890.0, ans=0.125 2024-08-18 17:02:41,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4001990.0, ans=0.125 2024-08-18 17:03:04,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.302e+01 2.500e+01 2.862e+01 4.462e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 17:03:12,559 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13600, loss[loss=0.09788, beats_loss=0.008662, ecapa_loss=0.0001872, whisper_loss=0.08735, over 17376.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001435, whisper_loss=0.09014, over 3869142.83 frames. ], batch size: 73, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:03:23,301 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 17:03:26,916 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 17:03:38,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4002290.0, ans=0.125 2024-08-18 17:03:38,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4002290.0, ans=0.125 2024-08-18 17:03:41,596 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-18 17:03:44,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=15.0 2024-08-18 17:04:06,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-18 17:04:18,342 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13650, loss[loss=0.1137, beats_loss=0.009357, ecapa_loss=0.0001799, whisper_loss=0.1026, over 20366.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001445, whisper_loss=0.08978, over 3866362.19 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:04:24,318 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-18 17:04:32,409 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=15.0 2024-08-18 17:04:46,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4002890.0, ans=0.125 2024-08-18 17:04:53,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4002890.0, ans=0.125 2024-08-18 17:04:56,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4002990.0, ans=0.125 2024-08-18 17:04:58,628 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 30 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-18 17:05:03,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4002990.0, ans=0.125 2024-08-18 17:05:12,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4003090.0, ans=0.125 2024-08-18 17:05:13,229 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.369e+01 2.636e+01 3.031e+01 4.631e+02, threshold=5.273e+01, percent-clipped=3.0 2024-08-18 17:05:15,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-08-18 17:05:19,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13700, loss[loss=0.09293, beats_loss=0.01235, ecapa_loss=0.0001318, whisper_loss=0.07926, over 18415.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001436, whisper_loss=0.09043, over 3866191.83 frames. ], batch size: 74, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:05:29,612 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 17:05:35,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4003290.0, ans=0.0 2024-08-18 17:05:49,285 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-18 17:05:50,255 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08100207895040512, model_norm_threshold=52.72901916503906 2024-08-18 17:05:50,436 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.003e+05, grad_sumsq=1.003e+05, orig_rms_sq=1.000e+00 2024-08-18 17:05:53,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4003390.0, ans=0.0 2024-08-18 17:06:14,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4003590.0, ans=0.5 2024-08-18 17:06:20,858 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13750, loss[loss=0.08151, beats_loss=0.009964, ecapa_loss=0.0001453, whisper_loss=0.07009, over 14410.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001442, whisper_loss=0.09038, over 3850525.39 frames. ], batch size: 57, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:06:30,122 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-18 17:06:50,840 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-18 17:07:03,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4003990.0, ans=0.0 2024-08-18 17:07:14,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4004090.0, ans=0.1 2024-08-18 17:07:16,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.418e+01 2.607e+01 2.994e+01 6.510e+02, threshold=5.215e+01, percent-clipped=3.0 2024-08-18 17:07:21,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4004090.0, ans=0.125 2024-08-18 17:07:23,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13800, loss[loss=0.1036, beats_loss=0.0112, ecapa_loss=0.0001079, whisper_loss=0.09134, over 23645.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0106, ecapa_loss=0.0001433, whisper_loss=0.0907, over 3862618.77 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:07:33,693 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 31 from Vox, 30 fro AS 2024-08-18 17:07:39,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4004290.0, ans=0.1 2024-08-18 17:07:39,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4004290.0, ans=0.1 2024-08-18 17:07:52,280 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-08-18 17:07:55,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4004390.0, ans=0.0 2024-08-18 17:07:58,375 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=22.5 2024-08-18 17:07:59,195 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-18 17:08:01,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4004490.0, ans=0.125 2024-08-18 17:08:10,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4004490.0, ans=0.125 2024-08-18 17:08:12,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4004490.0, ans=0.1 2024-08-18 17:08:12,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4004490.0, ans=0.1 2024-08-18 17:08:14,406 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-18 17:08:14,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4004590.0, ans=10.0 2024-08-18 17:08:16,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4004590.0, ans=0.125 2024-08-18 17:08:26,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13850, loss[loss=0.1324, beats_loss=0.008354, ecapa_loss=0.0001356, whisper_loss=0.1227, over 23645.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.09168, over 3879320.90 frames. ], batch size: 89, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:09:01,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4004890.0, ans=0.1 2024-08-18 17:09:10,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4004990.0, ans=0.125 2024-08-18 17:09:24,457 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.848e+01 2.330e+01 2.560e+01 2.931e+01 5.468e+01, threshold=5.120e+01, percent-clipped=1.0 2024-08-18 17:09:25,811 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-18 17:09:29,516 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 17:09:30,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13900, loss[loss=0.08792, beats_loss=0.01108, ecapa_loss=0.000144, whisper_loss=0.07541, over 22413.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01052, ecapa_loss=0.0001431, whisper_loss=0.09109, over 3882470.49 frames. ], batch size: 88, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:09:34,547 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 17:09:37,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4005190.0, ans=0.125 2024-08-18 17:09:53,504 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-18 17:09:57,564 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-18 17:10:00,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4005390.0, ans=0.09899494936611666 2024-08-18 17:10:16,235 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 17:10:25,765 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-18 17:10:27,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4005590.0, ans=0.0 2024-08-18 17:10:34,059 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 29 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 17:10:35,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4005690.0, ans=0.125 2024-08-18 17:10:36,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 13950, loss[loss=0.09948, beats_loss=0.009024, ecapa_loss=0.0001417, whisper_loss=0.08904, over 15902.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001433, whisper_loss=0.09138, over 3888743.93 frames. ], batch size: 60, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:10:38,934 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 17:10:50,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4005790.0, ans=0.125 2024-08-18 17:10:51,395 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 17:11:01,724 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 37 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 17:11:04,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-08-18 17:11:09,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4005890.0, ans=0.1 2024-08-18 17:11:10,707 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 17:11:14,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4005990.0, ans=0.125 2024-08-18 17:11:22,680 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 17:11:33,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.308e+01 2.540e+01 2.874e+01 4.374e+01, threshold=5.080e+01, percent-clipped=0.0 2024-08-18 17:11:35,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4006090.0, ans=0.1 2024-08-18 17:11:40,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14000, loss[loss=0.1032, beats_loss=0.00907, ecapa_loss=0.0001316, whisper_loss=0.09277, over 15816.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.0105, ecapa_loss=0.0001424, whisper_loss=0.09151, over 3861972.17 frames. ], batch size: 63, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:11:50,480 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-18 17:11:56,980 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-18 17:11:58,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4006290.0, ans=0.0 2024-08-18 17:12:06,077 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-18 17:12:07,311 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 17:12:26,710 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-18 17:12:30,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.26 vs. limit=22.5 2024-08-18 17:12:42,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4006590.0, ans=0.125 2024-08-18 17:12:44,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14050, loss[loss=0.1129, beats_loss=0.007836, ecapa_loss=0.0001172, whisper_loss=0.1039, over 15673.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01049, ecapa_loss=0.0001423, whisper_loss=0.09161, over 3834622.59 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:12:44,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4006690.0, ans=0.125 2024-08-18 17:12:48,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4006690.0, ans=0.07 2024-08-18 17:13:07,692 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.40 vs. limit=15.0 2024-08-18 17:13:21,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4006990.0, ans=0.125 2024-08-18 17:13:22,518 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 29 from Vox, 25 fro AS 2024-08-18 17:13:27,459 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 17:13:32,425 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 17:13:36,748 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 17:13:41,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4007090.0, ans=10.0 2024-08-18 17:13:41,651 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.335e+01 2.606e+01 2.930e+01 4.821e+01, threshold=5.212e+01, percent-clipped=0.0 2024-08-18 17:13:47,601 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14100, loss[loss=0.1056, beats_loss=0.01187, ecapa_loss=0.0001461, whisper_loss=0.0923, over 22760.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01055, ecapa_loss=0.0001435, whisper_loss=0.09082, over 3859470.09 frames. ], batch size: 92, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:13:54,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=22.5 2024-08-18 17:14:05,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.84 vs. limit=15.0 2024-08-18 17:14:08,623 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 17:14:09,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2024-08-18 17:14:16,144 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-18 17:14:35,442 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 17:14:37,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4007590.0, ans=0.125 2024-08-18 17:14:39,389 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 17:14:49,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4007690.0, ans=0.125 2024-08-18 17:14:50,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14150, loss[loss=0.1007, beats_loss=0.01035, ecapa_loss=0.0001152, whisper_loss=0.08921, over 17718.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0106, ecapa_loss=0.0001432, whisper_loss=0.09012, over 3844029.47 frames. ], batch size: 70, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:15:05,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4007790.0, ans=0.125 2024-08-18 17:15:10,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4007790.0, ans=0.125 2024-08-18 17:15:42,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4008090.0, ans=0.125 2024-08-18 17:15:43,921 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.27 vs. limit=22.5 2024-08-18 17:15:45,749 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-18 17:15:45,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4008090.0, ans=0.0 2024-08-18 17:15:47,911 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.262e+01 2.524e+01 2.804e+01 4.310e+01, threshold=5.048e+01, percent-clipped=0.0 2024-08-18 17:15:52,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4008090.0, ans=0.0 2024-08-18 17:15:54,605 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14200, loss[loss=0.1139, beats_loss=0.009514, ecapa_loss=0.0001493, whisper_loss=0.1029, over 19759.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001425, whisper_loss=0.09021, over 3878057.82 frames. ], batch size: 78, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:15:58,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4008190.0, ans=0.07 2024-08-18 17:16:03,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4008190.0, ans=0.125 2024-08-18 17:16:08,389 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 17:16:08,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4008290.0, ans=0.1 2024-08-18 17:16:12,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2024-08-18 17:16:33,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.30 vs. limit=10.0 2024-08-18 17:16:33,928 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4008490.0, ans=0.125 2024-08-18 17:16:49,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4008590.0, ans=0.0 2024-08-18 17:16:57,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14250, loss[loss=0.1084, beats_loss=0.00991, ecapa_loss=0.0001367, whisper_loss=0.09708, over 14388.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.09064, over 3894238.27 frames. ], batch size: 55, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:16:58,720 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 17:17:01,290 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-18 17:17:22,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-18 17:17:35,247 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-18 17:17:55,600 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.243e+01 2.470e+01 2.770e+01 7.680e+01, threshold=4.941e+01, percent-clipped=2.0 2024-08-18 17:17:56,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.58 vs. limit=10.0 2024-08-18 17:17:58,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4009090.0, ans=0.0 2024-08-18 17:18:01,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14300, loss[loss=0.1109, beats_loss=0.007577, ecapa_loss=0.0001886, whisper_loss=0.1014, over 17172.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01052, ecapa_loss=0.0001419, whisper_loss=0.09074, over 3909001.65 frames. ], batch size: 69, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:18:08,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4009190.0, ans=0.125 2024-08-18 17:18:21,674 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 17:18:40,343 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-18 17:18:41,861 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4009490.0, ans=0.125 2024-08-18 17:18:48,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4009490.0, ans=0.0 2024-08-18 17:18:49,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4009490.0, ans=0.0 2024-08-18 17:18:58,503 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 17:19:05,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-08-18 17:19:05,875 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14350, loss[loss=0.1021, beats_loss=0.01007, ecapa_loss=0.0001446, whisper_loss=0.0906, over 13839.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01061, ecapa_loss=0.0001418, whisper_loss=0.09049, over 3896719.86 frames. ], batch size: 56, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:19:12,376 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 17:19:38,580 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 17:19:59,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2024-08-18 17:20:05,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.040e+01 2.366e+01 2.598e+01 2.848e+01 4.928e+01, threshold=5.195e+01, percent-clipped=0.0 2024-08-18 17:20:13,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14400, loss[loss=0.1094, beats_loss=0.009283, ecapa_loss=0.0001689, whisper_loss=0.09845, over 22366.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01057, ecapa_loss=0.0001434, whisper_loss=0.09094, over 3927966.20 frames. ], batch size: 90, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:20:13,272 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 17:20:16,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4010190.0, ans=0.125 2024-08-18 17:20:16,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.15 vs. limit=15.0 2024-08-18 17:20:17,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4010190.0, ans=0.125 2024-08-18 17:20:18,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.06 vs. limit=15.0 2024-08-18 17:20:27,133 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 17:20:31,259 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 17:20:35,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4010290.0, ans=0.1 2024-08-18 17:20:40,195 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 17:20:40,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4010390.0, ans=0.125 2024-08-18 17:20:50,536 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 17:21:04,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-18 17:21:21,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 27, batch 14450, loss[loss=0.1104, beats_loss=0.01106, ecapa_loss=0.0001553, whisper_loss=0.09783, over 22790.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01057, ecapa_loss=0.0001444, whisper_loss=0.09057, over 3907805.11 frames. ], batch size: 93, lr: 2.25e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:21:21,772 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 17:21:31,962 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 17:21:36,698 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-18 17:21:42,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4010790.0, ans=0.0 2024-08-18 17:21:54,587 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 17:21:59,429 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 17:22:00,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4010990.0, ans=0.125 2024-08-18 17:22:02,399 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-18 17:22:35,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 0, loss[loss=0.1072, beats_loss=0.009857, ecapa_loss=0.0001714, whisper_loss=0.09566, over 20833.00 frames. ], tot_loss[loss=0.1072, beats_loss=0.009857, ecapa_loss=0.0001714, whisper_loss=0.09566, over 20833.00 frames. ], batch size: 84, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:22:35,125 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 17:23:13,225 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on ASR_libri: loss=0.253, beats_loss=0, ecapa_loss=0.000516, whisper_loss=0.2479, over 922467.00 frames. 2024-08-18 17:23:27,333 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on SV_voxceleb1: loss=0.004085, beats_loss=0, ecapa_loss=0.0004085, whisper_loss=0, over 939242.00 frames. 2024-08-18 17:23:59,419 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.8834, 1.1876, 1.3349, 0.4834, 1.0594, 1.3656, 0.9292, 0.9852], device='cuda:3') 2024-08-18 17:25:15,867 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 17:25:15,870 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 17:25:30,066 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.404e+01 2.660e+01 3.020e+01 3.509e+02, threshold=5.320e+01, percent-clipped=1.0 2024-08-18 17:25:40,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4011180.0, ans=0.2 2024-08-18 17:25:40,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4011180.0, ans=0.125 2024-08-18 17:25:52,235 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 17:26:23,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4011280.0, ans=0.125 2024-08-18 17:26:58,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4011480.0, ans=0.035 2024-08-18 17:27:14,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 50, loss[loss=0.07179, beats_loss=0.01018, ecapa_loss=0.0001216, whisper_loss=0.06039, over 17453.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.009578, ecapa_loss=0.0001476, whisper_loss=0.08959, over 867535.21 frames. ], batch size: 67, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:27:27,494 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 17:27:28,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2024-08-18 17:27:34,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4011580.0, ans=0.125 2024-08-18 17:27:36,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4011680.0, ans=0.0 2024-08-18 17:27:58,126 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 12 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 17:28:02,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4011780.0, ans=0.04949747468305833 2024-08-18 17:28:17,008 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 17:28:18,706 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 17:28:24,917 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 17:28:26,990 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 23 from Vox, 19 fro AS 2024-08-18 17:28:29,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4011880.0, ans=0.125 2024-08-18 17:28:31,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4011880.0, ans=0.1 2024-08-18 17:28:32,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4011880.0, ans=0.125 2024-08-18 17:29:03,713 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 100, loss[loss=0.1082, beats_loss=0.008594, ecapa_loss=0.0001398, whisper_loss=0.09822, over 23841.00 frames. ], tot_loss[loss=0.09966, beats_loss=0.009382, ecapa_loss=0.0001442, whisper_loss=0.08883, over 1531666.23 frames. ], batch size: 92, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:29:07,218 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.87 vs. limit=15.0 2024-08-18 17:29:08,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-18 17:29:12,361 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 17:29:15,950 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.274e+01 2.549e+01 2.774e+01 3.166e+01 3.794e+01, threshold=5.547e+01, percent-clipped=0.0 2024-08-18 17:29:23,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-18 17:29:25,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2024-08-18 17:29:48,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=8.0 2024-08-18 17:29:49,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4012280.0, ans=0.125 2024-08-18 17:29:55,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4012280.0, ans=0.1 2024-08-18 17:30:07,687 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 17:30:09,748 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 17:30:16,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4012380.0, ans=0.125 2024-08-18 17:30:32,551 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-18 17:30:32,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4012480.0, ans=0.1 2024-08-18 17:30:41,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 150, loss[loss=0.07907, beats_loss=0.008588, ecapa_loss=0.0001989, whisper_loss=0.0685, over 16493.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.009401, ecapa_loss=0.0001446, whisper_loss=0.08966, over 2014301.57 frames. ], batch size: 68, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:30:50,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-08-18 17:31:20,842 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-18 17:31:38,744 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 17:31:51,886 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 17:31:58,327 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 17:31:58,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4013080.0, ans=0.09899494936611666 2024-08-18 17:31:59,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 200, loss[loss=0.09899, beats_loss=0.0118, ecapa_loss=0.0001401, whisper_loss=0.08579, over 22894.00 frames. ], tot_loss[loss=0.102, beats_loss=0.009549, ecapa_loss=0.0001442, whisper_loss=0.09099, over 2412269.20 frames. ], batch size: 89, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:32:08,138 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.367e+01 2.619e+01 2.925e+01 1.442e+02, threshold=5.239e+01, percent-clipped=3.0 2024-08-18 17:32:17,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4013180.0, ans=0.125 2024-08-18 17:32:24,983 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 17:32:26,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.87 vs. limit=10.0 2024-08-18 17:32:28,037 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 17:32:29,073 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 31 from Vox, 37 fro AS 2024-08-18 17:32:43,471 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 29 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 17:33:08,136 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 250, loss[loss=0.1053, beats_loss=0.01042, ecapa_loss=0.0001538, whisper_loss=0.09332, over 23266.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009815, ecapa_loss=0.0001448, whisper_loss=0.09029, over 2734016.18 frames. ], batch size: 95, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:33:08,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.90 vs. limit=22.5 2024-08-18 17:33:11,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4013580.0, ans=0.125 2024-08-18 17:33:21,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4013680.0, ans=0.125 2024-08-18 17:33:32,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4013680.0, ans=0.125 2024-08-18 17:33:35,686 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 38 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 17:33:42,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4013780.0, ans=0.0 2024-08-18 17:33:44,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4013780.0, ans=0.2 2024-08-18 17:33:51,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4013880.0, ans=0.125 2024-08-18 17:33:58,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4013880.0, ans=0.0 2024-08-18 17:34:02,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4013980.0, ans=0.1 2024-08-18 17:34:09,375 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03464806452393532, model_norm_threshold=52.38976287841797 2024-08-18 17:34:09,544 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.19, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.434e+05, grad_sumsq=4.434e+05, orig_rms_sq=1.000e+00 2024-08-18 17:34:15,892 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 300, loss[loss=0.08135, beats_loss=0.0113, ecapa_loss=0.0001664, whisper_loss=0.06838, over 15094.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009912, ecapa_loss=0.0001463, whisper_loss=0.09056, over 2995904.33 frames. ], batch size: 64, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:34:22,683 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 17:34:23,788 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.275e+01 2.490e+01 2.772e+01 1.512e+03, threshold=4.979e+01, percent-clipped=1.0 2024-08-18 17:34:26,553 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 17:34:33,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-18 17:35:01,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4014380.0, ans=0.125 2024-08-18 17:35:19,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 350, loss[loss=0.09969, beats_loss=0.00932, ecapa_loss=0.0001263, whisper_loss=0.08911, over 17134.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009959, ecapa_loss=0.000145, whisper_loss=0.09067, over 3146603.56 frames. ], batch size: 69, lr: 2.21e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:35:22,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4014580.0, ans=0.125 2024-08-18 17:35:34,781 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 17:35:43,278 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-18 17:35:54,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4014780.0, ans=0.0 2024-08-18 17:36:20,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 400, loss[loss=0.09752, beats_loss=0.008484, ecapa_loss=0.0001469, whisper_loss=0.08757, over 18138.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01005, ecapa_loss=0.0001438, whisper_loss=0.08998, over 3292725.55 frames. ], batch size: 71, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:36:28,117 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.258e+01 2.514e+01 2.865e+01 8.622e+01, threshold=5.028e+01, percent-clipped=3.0 2024-08-18 17:36:49,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4015280.0, ans=0.1 2024-08-18 17:36:55,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4015280.0, ans=0.2 2024-08-18 17:37:04,125 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06247842684388161, model_norm_threshold=50.280113220214844 2024-08-18 17:37:04,285 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.000e+05, grad_sumsq=1.000e+05, orig_rms_sq=1.000e+00 2024-08-18 17:37:05,737 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 17:37:09,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4015480.0, ans=0.0 2024-08-18 17:37:12,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4015480.0, ans=0.0 2024-08-18 17:37:18,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4015480.0, ans=0.1 2024-08-18 17:37:18,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4015480.0, ans=0.125 2024-08-18 17:37:23,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 450, loss[loss=0.1142, beats_loss=0.008192, ecapa_loss=0.0001179, whisper_loss=0.1048, over 15535.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01019, ecapa_loss=0.0001446, whisper_loss=0.08987, over 3425122.40 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:37:26,816 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-18 17:37:33,517 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-18 17:37:38,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4015680.0, ans=0.0 2024-08-18 17:37:49,186 WARNING [optim.py:496] (3/4) Scaling gradients by 0.02706790715456009, model_norm_threshold=50.280113220214844 2024-08-18 17:37:49,353 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.902e+05, grad_sumsq=3.776e+07, orig_rms_sq=1.033e-02 2024-08-18 17:37:53,299 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 17:37:55,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.69 vs. limit=12.0 2024-08-18 17:37:59,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.74 vs. limit=15.0 2024-08-18 17:38:03,742 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2024-08-18 17:38:25,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 500, loss[loss=0.1085, beats_loss=0.00931, ecapa_loss=0.000138, whisper_loss=0.0978, over 18695.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01021, ecapa_loss=0.0001435, whisper_loss=0.08977, over 3488704.96 frames. ], batch size: 72, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:38:26,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-18 17:38:27,739 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 17:38:32,627 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.936e+01 2.374e+01 2.639e+01 2.877e+01 1.858e+03, threshold=5.278e+01, percent-clipped=3.0 2024-08-18 17:38:38,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4016180.0, ans=0.0 2024-08-18 17:38:40,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4016180.0, ans=0.2 2024-08-18 17:38:47,624 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-18 17:38:47,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4016180.0, ans=0.2 2024-08-18 17:38:50,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4016280.0, ans=0.125 2024-08-18 17:39:01,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4016380.0, ans=0.05 2024-08-18 17:39:05,004 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 17:39:13,657 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 24 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-18 17:39:23,669 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 18 from Vox, 47 fro AS 2024-08-18 17:39:27,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 550, loss[loss=0.1087, beats_loss=0.00815, ecapa_loss=0.0001223, whisper_loss=0.09937, over 20332.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01023, ecapa_loss=0.0001438, whisper_loss=0.08982, over 3549979.93 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:39:35,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4016580.0, ans=0.125 2024-08-18 17:39:36,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4016580.0, ans=0.0 2024-08-18 17:39:37,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-08-18 17:39:38,817 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 17:39:42,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2024-08-18 17:39:57,542 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 17:40:05,113 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 17:40:18,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4016980.0, ans=0.125 2024-08-18 17:40:21,147 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 17:40:29,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 600, loss[loss=0.08784, beats_loss=0.01253, ecapa_loss=0.0001372, whisper_loss=0.07394, over 22758.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01025, ecapa_loss=0.0001432, whisper_loss=0.09017, over 3624056.61 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:40:36,913 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.365e+01 2.589e+01 2.843e+01 3.555e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-18 17:40:42,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4017180.0, ans=0.125 2024-08-18 17:41:04,854 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 17:41:14,655 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 17:41:15,927 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 17:41:22,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2024-08-18 17:41:28,286 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 17:41:31,748 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 650, loss[loss=0.09565, beats_loss=0.009485, ecapa_loss=0.0001311, whisper_loss=0.08486, over 17609.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01022, ecapa_loss=0.000143, whisper_loss=0.09019, over 3677643.60 frames. ], batch size: 67, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:41:32,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4017580.0, ans=0.125 2024-08-18 17:42:34,220 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 700, loss[loss=0.1101, beats_loss=0.008266, ecapa_loss=0.0001317, whisper_loss=0.1005, over 18280.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0101, ecapa_loss=0.000144, whisper_loss=0.09123, over 3679497.08 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:42:41,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.261e+01 2.566e+01 2.914e+01 5.332e+01, threshold=5.131e+01, percent-clipped=1.0 2024-08-18 17:42:41,666 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 17:42:57,940 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-18 17:43:06,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4018280.0, ans=0.0 2024-08-18 17:43:14,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-18 17:43:20,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4018380.0, ans=0.125 2024-08-18 17:43:28,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4018480.0, ans=0.1 2024-08-18 17:43:33,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4018480.0, ans=0.125 2024-08-18 17:43:36,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 750, loss[loss=0.08588, beats_loss=0.01331, ecapa_loss=0.0001088, whisper_loss=0.07148, over 15987.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01017, ecapa_loss=0.0001435, whisper_loss=0.09079, over 3686447.16 frames. ], batch size: 62, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:43:50,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4018680.0, ans=0.5 2024-08-18 17:43:50,584 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-18 17:43:51,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4018680.0, ans=0.0 2024-08-18 17:43:56,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4018680.0, ans=0.125 2024-08-18 17:43:58,354 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 17:44:10,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4018780.0, ans=0.125 2024-08-18 17:44:13,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4018880.0, ans=0.2 2024-08-18 17:44:17,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4018880.0, ans=0.125 2024-08-18 17:44:26,013 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 23 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-18 17:44:26,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4018980.0, ans=0.035 2024-08-18 17:44:28,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4018980.0, ans=0.125 2024-08-18 17:44:38,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 800, loss[loss=0.08521, beats_loss=0.01159, ecapa_loss=0.0001508, whisper_loss=0.0721, over 17505.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01018, ecapa_loss=0.0001426, whisper_loss=0.09037, over 3674486.90 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:44:45,672 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.204e+01 2.473e+01 2.754e+01 3.605e+01, threshold=4.946e+01, percent-clipped=0.0 2024-08-18 17:44:50,663 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 17:45:01,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4019180.0, ans=0.0 2024-08-18 17:45:02,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4019280.0, ans=0.125 2024-08-18 17:45:18,390 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 17:45:18,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4019380.0, ans=0.125 2024-08-18 17:45:21,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4019380.0, ans=0.1 2024-08-18 17:45:40,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 850, loss[loss=0.113, beats_loss=0.01008, ecapa_loss=0.0001523, whisper_loss=0.1014, over 19904.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01026, ecapa_loss=0.0001424, whisper_loss=0.08983, over 3721781.10 frames. ], batch size: 80, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:45:42,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4019580.0, ans=0.125 2024-08-18 17:45:58,263 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 17:45:58,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4019680.0, ans=0.125 2024-08-18 17:46:07,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4019780.0, ans=0.125 2024-08-18 17:46:09,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4019780.0, ans=0.0 2024-08-18 17:46:20,660 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 17:46:35,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4019980.0, ans=0.0 2024-08-18 17:46:42,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 900, loss[loss=0.1048, beats_loss=0.01099, ecapa_loss=0.0001799, whisper_loss=0.09204, over 21516.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01018, ecapa_loss=0.0001428, whisper_loss=0.09003, over 3731483.48 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:46:46,635 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 17:46:47,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4020080.0, ans=0.0 2024-08-18 17:46:50,213 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.258e+01 2.407e+01 2.605e+01 4.279e+01, threshold=4.815e+01, percent-clipped=0.0 2024-08-18 17:46:51,945 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 17:47:14,406 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-18 17:47:16,887 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 17:47:18,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4020280.0, ans=0.125 2024-08-18 17:47:22,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4020380.0, ans=0.09899494936611666 2024-08-18 17:47:25,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4020380.0, ans=0.125 2024-08-18 17:47:38,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4020480.0, ans=0.125 2024-08-18 17:47:44,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-08-18 17:47:45,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 950, loss[loss=0.07557, beats_loss=0.01045, ecapa_loss=0.0001225, whisper_loss=0.0639, over 17142.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01019, ecapa_loss=0.0001422, whisper_loss=0.08986, over 3751927.32 frames. ], batch size: 66, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:47:45,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4020580.0, ans=0.125 2024-08-18 17:47:51,710 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 17:48:03,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4020680.0, ans=0.125 2024-08-18 17:48:06,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4020680.0, ans=0.125 2024-08-18 17:48:15,331 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 17:48:27,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4020880.0, ans=0.1 2024-08-18 17:48:40,031 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 17:48:47,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1000, loss[loss=0.1165, beats_loss=0.008792, ecapa_loss=0.0001745, whisper_loss=0.106, over 22683.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01024, ecapa_loss=0.0001416, whisper_loss=0.09008, over 3768100.17 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:48:47,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4021080.0, ans=0.0 2024-08-18 17:48:47,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4021080.0, ans=0.0 2024-08-18 17:48:47,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4021080.0, ans=0.125 2024-08-18 17:48:54,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4021080.0, ans=0.1 2024-08-18 17:48:54,812 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.721e+01 2.179e+01 2.463e+01 2.751e+01 3.706e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-18 17:49:04,978 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 17:49:05,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4021180.0, ans=0.125 2024-08-18 17:49:06,690 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=3.189e+00 2024-08-18 17:49:13,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4021280.0, ans=0.125 2024-08-18 17:49:37,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4021480.0, ans=0.0 2024-08-18 17:49:49,834 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-18 17:49:50,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1050, loss[loss=0.08953, beats_loss=0.01306, ecapa_loss=0.0001286, whisper_loss=0.07518, over 19342.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0103, ecapa_loss=0.0001409, whisper_loss=0.09004, over 3767672.07 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:49:55,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4021580.0, ans=0.1 2024-08-18 17:49:58,327 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.963e+05 2024-08-18 17:50:01,866 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 17:50:03,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4021680.0, ans=0.125 2024-08-18 17:50:09,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4021680.0, ans=0.125 2024-08-18 17:50:14,034 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-18 17:50:31,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4021880.0, ans=0.2 2024-08-18 17:50:45,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4021980.0, ans=0.125 2024-08-18 17:50:54,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1100, loss[loss=0.1245, beats_loss=0.00678, ecapa_loss=0.0001811, whisper_loss=0.1159, over 22081.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01025, ecapa_loss=0.0001417, whisper_loss=0.0898, over 3788679.49 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:50:58,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4022080.0, ans=0.125 2024-08-18 17:51:00,988 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 17:51:01,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4022080.0, ans=0.0 2024-08-18 17:51:01,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.87 vs. limit=6.0 2024-08-18 17:51:01,992 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.946e+01 2.358e+01 2.562e+01 2.926e+01 4.573e+02, threshold=5.124e+01, percent-clipped=2.0 2024-08-18 17:51:06,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4022180.0, ans=0.125 2024-08-18 17:51:14,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4022180.0, ans=0.125 2024-08-18 17:51:17,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4022180.0, ans=0.1 2024-08-18 17:51:27,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4022280.0, ans=0.125 2024-08-18 17:51:46,596 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.373e-02 2024-08-18 17:51:54,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4022480.0, ans=0.125 2024-08-18 17:51:55,559 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-18 17:51:58,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1150, loss[loss=0.1043, beats_loss=0.008887, ecapa_loss=0.0001142, whisper_loss=0.09431, over 18825.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01026, ecapa_loss=0.0001417, whisper_loss=0.08962, over 3846503.11 frames. ], batch size: 68, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:51:59,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4022580.0, ans=0.1 2024-08-18 17:52:05,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4022580.0, ans=0.125 2024-08-18 17:52:17,825 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 17:52:19,919 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4022680.0, ans=0.125 2024-08-18 17:52:46,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4022880.0, ans=0.125 2024-08-18 17:52:51,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4022980.0, ans=0.125 2024-08-18 17:53:05,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1200, loss[loss=0.102, beats_loss=0.0114, ecapa_loss=0.0001463, whisper_loss=0.08918, over 18855.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01026, ecapa_loss=0.000142, whisper_loss=0.0893, over 3798212.91 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:53:13,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.256e+01 2.483e+01 2.794e+01 3.745e+01, threshold=4.967e+01, percent-clipped=0.0 2024-08-18 17:53:15,653 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 17 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 17:53:39,787 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03793781250715256, model_norm_threshold=49.66889572143555 2024-08-18 17:53:39,954 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.273e+05, grad_sumsq=2.197e+07, orig_rms_sq=1.035e-02 2024-08-18 17:53:42,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.62 vs. limit=10.0 2024-08-18 17:53:49,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4023380.0, ans=0.1 2024-08-18 17:54:01,145 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-18 17:54:07,869 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 17:54:14,831 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1250, loss[loss=0.09999, beats_loss=0.009943, ecapa_loss=0.0001368, whisper_loss=0.08867, over 22221.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01025, ecapa_loss=0.000142, whisper_loss=0.08953, over 3822103.33 frames. ], batch size: 87, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:54:19,803 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-18 17:54:56,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4023780.0, ans=0.1 2024-08-18 17:55:12,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4023880.0, ans=0.0 2024-08-18 17:55:18,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4023980.0, ans=0.1 2024-08-18 17:55:29,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1300, loss[loss=0.1039, beats_loss=0.01092, ecapa_loss=0.0001223, whisper_loss=0.09179, over 22812.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.000141, whisper_loss=0.0892, over 3797273.99 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:55:38,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.331e+01 2.596e+01 3.040e+01 1.309e+03, threshold=5.193e+01, percent-clipped=2.0 2024-08-18 17:55:45,840 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 17:55:56,757 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-18 17:56:06,262 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 17:56:15,010 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-18 17:56:34,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4024480.0, ans=0.1 2024-08-18 17:56:42,452 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1350, loss[loss=0.1095, beats_loss=0.01247, ecapa_loss=0.0001332, whisper_loss=0.0957, over 17566.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01031, ecapa_loss=0.0001419, whisper_loss=0.08939, over 3782775.91 frames. ], batch size: 70, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:56:42,777 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 17:57:01,263 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 29 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-18 17:57:07,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4024680.0, ans=0.0 2024-08-18 17:57:17,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4024780.0, ans=0.0 2024-08-18 17:57:24,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4024880.0, ans=0.125 2024-08-18 17:57:26,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4024880.0, ans=0.125 2024-08-18 17:57:27,240 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 17:57:34,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4024880.0, ans=0.125 2024-08-18 17:57:48,812 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-18 17:57:54,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1400, loss[loss=0.1125, beats_loss=0.009317, ecapa_loss=0.0001333, whisper_loss=0.1019, over 19984.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08942, over 3777888.93 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 1.152921504606847e+18 2024-08-18 17:58:02,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.705e+01 2.174e+01 2.386e+01 2.635e+01 4.112e+01, threshold=4.772e+01, percent-clipped=0.0 2024-08-18 17:58:21,564 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 17:58:22,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4025280.0, ans=0.0 2024-08-18 17:58:25,006 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 17:58:31,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.80 vs. limit=15.0 2024-08-18 17:58:38,470 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 17:58:49,588 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 17:58:49,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4025380.0, ans=0.0 2024-08-18 17:58:49,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4025380.0, ans=0.2 2024-08-18 17:58:51,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-18 17:58:58,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4025480.0, ans=0.1 2024-08-18 17:59:06,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1450, loss[loss=0.0581, beats_loss=0.01101, ecapa_loss=0.0001268, whisper_loss=0.04582, over 18589.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.0104, ecapa_loss=0.0001407, whisper_loss=0.08859, over 3764454.97 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 17:59:39,184 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2024-08-18 17:59:39,982 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-18 18:00:02,342 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 18:00:05,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4025780.0, ans=0.125 2024-08-18 18:00:05,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.19 vs. limit=15.0 2024-08-18 18:00:06,369 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 18:00:29,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4025880.0, ans=0.2 2024-08-18 18:00:32,037 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.80 vs. limit=10.0 2024-08-18 18:00:37,133 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=22.5 2024-08-18 18:00:39,097 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 18:00:46,962 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1500, loss[loss=0.09142, beats_loss=0.01343, ecapa_loss=0.0001191, whisper_loss=0.0768, over 18871.00 frames. ], tot_loss[loss=0.09999, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08816, over 3757138.58 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:00:52,809 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 18:00:57,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.252e+01 2.528e+01 2.901e+01 4.004e+01, threshold=5.056e+01, percent-clipped=0.0 2024-08-18 18:01:04,988 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-18 18:01:13,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4026180.0, ans=0.0 2024-08-18 18:01:13,684 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4026180.0, ans=0.125 2024-08-18 18:01:19,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4026280.0, ans=0.0 2024-08-18 18:01:21,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-08-18 18:01:32,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4026380.0, ans=0.125 2024-08-18 18:01:41,069 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 18:01:48,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4026480.0, ans=0.125 2024-08-18 18:01:59,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1550, loss[loss=0.1084, beats_loss=0.008676, ecapa_loss=0.0001185, whisper_loss=0.09852, over 15815.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.08826, over 3783433.74 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:02:00,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4026580.0, ans=0.125 2024-08-18 18:02:08,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4026580.0, ans=0.125 2024-08-18 18:02:32,105 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 18:02:56,256 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.24 vs. limit=6.0 2024-08-18 18:02:58,680 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 18:03:09,381 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1600, loss[loss=0.1147, beats_loss=0.01066, ecapa_loss=0.0001688, whisper_loss=0.1023, over 22483.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.08831, over 3815412.40 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:03:11,598 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 18:03:11,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4027080.0, ans=0.0 2024-08-18 18:03:14,086 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-18 18:03:15,348 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-18 18:03:17,977 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 18:03:19,040 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.241e+01 2.454e+01 2.842e+01 4.448e+01, threshold=4.908e+01, percent-clipped=0.0 2024-08-18 18:03:21,792 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 18:03:26,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4027180.0, ans=0.2 2024-08-18 18:03:42,533 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 18:03:42,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4027280.0, ans=0.125 2024-08-18 18:03:59,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4027380.0, ans=0.0 2024-08-18 18:04:01,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4027380.0, ans=0.1 2024-08-18 18:04:08,106 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-18 18:04:18,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4027580.0, ans=0.0 2024-08-18 18:04:18,839 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1650, loss[loss=0.1071, beats_loss=0.009375, ecapa_loss=0.000136, whisper_loss=0.09633, over 16488.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01052, ecapa_loss=0.0001385, whisper_loss=0.08858, over 3837926.53 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 18:04:22,440 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:04:26,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4027580.0, ans=0.0 2024-08-18 18:04:34,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4027680.0, ans=0.125 2024-08-18 18:04:35,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4027680.0, ans=0.125 2024-08-18 18:04:37,778 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 18:04:41,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2024-08-18 18:04:44,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4027680.0, ans=0.125 2024-08-18 18:05:08,250 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 18:05:20,614 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.149e+05 2024-08-18 18:05:26,420 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1700, loss[loss=0.08187, beats_loss=0.01235, ecapa_loss=0.0001546, whisper_loss=0.06797, over 19498.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01046, ecapa_loss=0.0001398, whisper_loss=0.08906, over 3844047.52 frames. ], batch size: 83, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:05:37,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.383e+01 2.589e+01 2.926e+01 5.501e+01, threshold=5.178e+01, percent-clipped=1.0 2024-08-18 18:05:52,520 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 18:06:06,740 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 18:06:17,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2024-08-18 18:06:23,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4028480.0, ans=0.0 2024-08-18 18:06:27,284 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 18:06:32,297 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1750, loss[loss=0.08966, beats_loss=0.009718, ecapa_loss=0.0001667, whisper_loss=0.07827, over 21441.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01036, ecapa_loss=0.0001418, whisper_loss=0.08951, over 3841740.78 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:06:46,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4028680.0, ans=0.125 2024-08-18 18:06:46,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4028680.0, ans=0.2 2024-08-18 18:06:48,128 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4028680.0, ans=0.0 2024-08-18 18:07:07,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4028780.0, ans=0.125 2024-08-18 18:07:11,354 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 18:07:16,763 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 18:07:44,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4028980.0, ans=0.0 2024-08-18 18:07:50,232 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 18:07:51,187 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1800, loss[loss=0.1039, beats_loss=0.009208, ecapa_loss=0.0001485, whisper_loss=0.09319, over 20296.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.08951, over 3858910.11 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:07:56,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4029080.0, ans=0.0 2024-08-18 18:08:03,007 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.191e+01 2.428e+01 2.696e+01 4.164e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-18 18:08:33,258 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4029280.0, ans=0.125 2024-08-18 18:08:34,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4029380.0, ans=0.2 2024-08-18 18:08:43,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.19 vs. limit=15.0 2024-08-18 18:08:47,656 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 18:08:55,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4029480.0, ans=0.2 2024-08-18 18:09:00,328 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 18:09:04,173 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1850, loss[loss=0.09313, beats_loss=0.009198, ecapa_loss=0.0001454, whisper_loss=0.08248, over 14659.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001419, whisper_loss=0.08945, over 3825238.47 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:09:15,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4029580.0, ans=0.125 2024-08-18 18:09:16,129 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 18:09:19,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4029680.0, ans=0.1 2024-08-18 18:09:29,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4029680.0, ans=0.1 2024-08-18 18:09:36,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4029780.0, ans=0.0 2024-08-18 18:09:42,988 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 18:10:15,860 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1900, loss[loss=0.1182, beats_loss=0.0103, ecapa_loss=0.0002188, whisper_loss=0.1057, over 13039.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01033, ecapa_loss=0.0001419, whisper_loss=0.08896, over 3830335.33 frames. ], batch size: 57, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:10:20,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4030080.0, ans=0.0 2024-08-18 18:10:27,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.264e+01 2.517e+01 2.852e+01 3.741e+01, threshold=5.034e+01, percent-clipped=0.0 2024-08-18 18:10:35,329 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:10:39,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4030180.0, ans=0.0 2024-08-18 18:10:47,653 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 18:10:59,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4030380.0, ans=0.0 2024-08-18 18:11:00,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=4030380.0, ans=10.0 2024-08-18 18:11:07,679 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 18:11:09,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4030380.0, ans=0.125 2024-08-18 18:11:10,879 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 18:11:20,757 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-18 18:11:20,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4030480.0, ans=0.1 2024-08-18 18:11:21,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4030480.0, ans=0.1 2024-08-18 18:11:25,100 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 18:11:26,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4030580.0, ans=0.07 2024-08-18 18:11:27,279 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 1950, loss[loss=0.1179, beats_loss=0.01005, ecapa_loss=0.0001249, whisper_loss=0.1066, over 15254.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01028, ecapa_loss=0.0001417, whisper_loss=0.08945, over 3829373.44 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:11:33,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4030580.0, ans=0.0 2024-08-18 18:11:47,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4030680.0, ans=0.0 2024-08-18 18:12:10,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4030880.0, ans=0.09899494936611666 2024-08-18 18:12:12,447 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-18 18:12:35,114 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 18:12:38,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2000, loss[loss=0.1046, beats_loss=0.008894, ecapa_loss=0.0001512, whisper_loss=0.09416, over 22233.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01032, ecapa_loss=0.0001401, whisper_loss=0.08917, over 3843307.21 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:12:39,114 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:12:40,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4031080.0, ans=0.125 2024-08-18 18:12:47,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4031080.0, ans=0.0 2024-08-18 18:12:49,416 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.230e+01 2.583e+01 2.898e+01 3.757e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-18 18:13:15,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-18 18:13:19,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4031380.0, ans=0.125 2024-08-18 18:13:25,794 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.945e+01 2024-08-18 18:13:32,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4031380.0, ans=0.025 2024-08-18 18:13:35,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2024-08-18 18:13:50,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2050, loss[loss=0.06937, beats_loss=0.01108, ecapa_loss=0.000153, whisper_loss=0.05676, over 14053.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01028, ecapa_loss=0.0001397, whisper_loss=0.09004, over 3862457.25 frames. ], batch size: 58, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:13:54,809 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 18:13:59,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4031580.0, ans=0.125 2024-08-18 18:14:01,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4031580.0, ans=0.2 2024-08-18 18:14:11,738 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:14:12,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2024-08-18 18:14:35,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4031880.0, ans=0.1 2024-08-18 18:14:37,087 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 18:14:49,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4031980.0, ans=0.0 2024-08-18 18:14:52,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4031980.0, ans=0.125 2024-08-18 18:14:55,097 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 18:14:58,026 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.02 vs. limit=15.0 2024-08-18 18:14:59,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2100, loss[loss=0.1036, beats_loss=0.0102, ecapa_loss=0.0001148, whisper_loss=0.09221, over 18406.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01035, ecapa_loss=0.0001386, whisper_loss=0.08985, over 3866073.24 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:15:04,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4032080.0, ans=0.125 2024-08-18 18:15:11,528 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.315e+01 2.589e+01 2.844e+01 4.091e+01, threshold=5.179e+01, percent-clipped=0.0 2024-08-18 18:15:11,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=4032080.0, ans=0.5 2024-08-18 18:15:23,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4032180.0, ans=0.125 2024-08-18 18:15:23,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4032180.0, ans=0.0 2024-08-18 18:15:25,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4032180.0, ans=0.125 2024-08-18 18:15:32,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4032280.0, ans=0.2 2024-08-18 18:15:37,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4032280.0, ans=0.1 2024-08-18 18:15:37,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4032280.0, ans=0.0 2024-08-18 18:16:05,605 WARNING [optim.py:496] (3/4) Scaling gradients by 0.018764860928058624, model_norm_threshold=51.787418365478516 2024-08-18 18:16:05,773 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.003e+06, grad_sumsq=1.941e+08, orig_rms_sq=1.032e-02 2024-08-18 18:16:10,271 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-18 18:16:12,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2150, loss[loss=0.07157, beats_loss=0.01425, ecapa_loss=0.0001301, whisper_loss=0.05602, over 13553.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01045, ecapa_loss=0.000138, whisper_loss=0.08948, over 3859552.62 frames. ], batch size: 54, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:16:13,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4032580.0, ans=0.125 2024-08-18 18:16:30,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.51 vs. limit=12.0 2024-08-18 18:16:32,027 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2024-08-18 18:16:35,422 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 18:16:44,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4032780.0, ans=0.1 2024-08-18 18:16:45,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4032780.0, ans=0.125 2024-08-18 18:16:55,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4032880.0, ans=15.0 2024-08-18 18:17:00,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4032880.0, ans=0.2 2024-08-18 18:17:23,049 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2200, loss[loss=0.0914, beats_loss=0.01149, ecapa_loss=0.0001449, whisper_loss=0.07846, over 21587.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001381, whisper_loss=0.08977, over 3819239.22 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:17:23,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4033080.0, ans=0.125 2024-08-18 18:17:34,268 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.249e+01 2.473e+01 2.891e+01 2.760e+03, threshold=4.945e+01, percent-clipped=3.0 2024-08-18 18:17:34,395 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 13 from Vox, 45 fro AS 2024-08-18 18:18:03,593 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2024-08-18 18:18:07,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4033380.0, ans=0.0 2024-08-18 18:18:14,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4033380.0, ans=0.0 2024-08-18 18:18:15,758 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 18:18:21,147 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-18 18:18:34,898 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2250, loss[loss=0.0998, beats_loss=0.0119, ecapa_loss=0.0001271, whisper_loss=0.08664, over 22989.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001397, whisper_loss=0.09033, over 3832055.63 frames. ], batch size: 92, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:18:36,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4033580.0, ans=10.0 2024-08-18 18:18:48,824 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 18:19:11,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4033780.0, ans=0.125 2024-08-18 18:19:13,522 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4033780.0, ans=0.5 2024-08-18 18:19:17,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4033880.0, ans=0.2 2024-08-18 18:19:23,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.08 vs. limit=15.0 2024-08-18 18:19:25,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=4033880.0, ans=22.5 2024-08-18 18:19:44,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2300, loss[loss=0.1201, beats_loss=0.009262, ecapa_loss=0.0001539, whisper_loss=0.1093, over 22850.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01049, ecapa_loss=0.0001401, whisper_loss=0.09125, over 3850723.36 frames. ], batch size: 89, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:19:55,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4034080.0, ans=0.125 2024-08-18 18:19:55,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.302e+01 2.462e+01 2.661e+01 7.808e+01, threshold=4.924e+01, percent-clipped=1.0 2024-08-18 18:20:15,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4034280.0, ans=0.2 2024-08-18 18:20:31,397 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 18:20:39,412 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-18 18:20:52,189 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2350, loss[loss=0.12, beats_loss=0.008703, ecapa_loss=0.0001569, whisper_loss=0.1098, over 20592.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.09132, over 3846713.18 frames. ], batch size: 81, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:20:52,724 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 36 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 18:20:52,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4034580.0, ans=0.0 2024-08-18 18:21:30,848 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 18:21:32,097 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 18:22:01,333 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2400, loss[loss=0.1188, beats_loss=0.008495, ecapa_loss=0.0001152, whisper_loss=0.1091, over 20327.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001412, whisper_loss=0.09149, over 3867896.94 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:22:01,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4035080.0, ans=0.0 2024-08-18 18:22:04,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4035080.0, ans=0.95 2024-08-18 18:22:05,441 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-18 18:22:09,421 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 18:22:11,485 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.715e+01 2.282e+01 2.511e+01 2.769e+01 4.268e+01, threshold=5.023e+01, percent-clipped=0.0 2024-08-18 18:22:26,020 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 18:22:31,693 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 18:22:37,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4035280.0, ans=0.1 2024-08-18 18:22:44,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2024-08-18 18:22:53,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4035380.0, ans=0.125 2024-08-18 18:22:57,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4035480.0, ans=0.125 2024-08-18 18:23:04,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4035480.0, ans=0.1 2024-08-18 18:23:09,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2450, loss[loss=0.08122, beats_loss=0.01277, ecapa_loss=0.000153, whisper_loss=0.06691, over 17307.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01043, ecapa_loss=0.0001412, whisper_loss=0.09165, over 3865500.43 frames. ], batch size: 74, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:23:18,226 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-18 18:23:25,241 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 33 from Vox, 34 fro AS 2024-08-18 18:23:27,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4035680.0, ans=0.0 2024-08-18 18:23:35,836 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 18:23:38,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4035780.0, ans=0.125 2024-08-18 18:23:49,580 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 29 from Vox, 37 fro AS 2024-08-18 18:23:57,685 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 18:24:12,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2024-08-18 18:24:30,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2024-08-18 18:24:30,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2500, loss[loss=0.09105, beats_loss=0.01036, ecapa_loss=0.0001434, whisper_loss=0.07926, over 19442.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.09201, over 3873038.95 frames. ], batch size: 79, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:24:44,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.285e+01 2.484e+01 2.880e+01 1.174e+02, threshold=4.969e+01, percent-clipped=1.0 2024-08-18 18:24:47,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4036180.0, ans=0.125 2024-08-18 18:24:59,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4036180.0, ans=0.0 2024-08-18 18:25:03,910 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 18:25:17,154 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 18:25:27,523 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-18 18:25:28,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4036380.0, ans=0.0 2024-08-18 18:25:35,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4036380.0, ans=0.2 2024-08-18 18:25:53,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4036480.0, ans=0.0 2024-08-18 18:26:01,416 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 18:26:03,565 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2550, loss[loss=0.137, beats_loss=0.007733, ecapa_loss=0.0001591, whisper_loss=0.1277, over 17722.00 frames. ], tot_loss[loss=0.1042, beats_loss=0.01028, ecapa_loss=0.0001409, whisper_loss=0.09248, over 3883124.80 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:26:16,738 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 18:26:42,270 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.659e+01 2024-08-18 18:27:04,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4036880.0, ans=0.2 2024-08-18 18:27:18,524 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2024-08-18 18:27:25,530 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 18:27:31,655 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2600, loss[loss=0.07395, beats_loss=0.01106, ecapa_loss=0.0001987, whisper_loss=0.0609, over 16005.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.09129, over 3892374.15 frames. ], batch size: 71, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:27:34,785 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-18 18:27:36,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4037080.0, ans=0.125 2024-08-18 18:27:43,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.385e+01 2.553e+01 2.816e+01 4.584e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-18 18:27:54,592 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 18:27:58,762 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-18 18:28:04,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4037280.0, ans=0.2 2024-08-18 18:28:12,442 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 18:28:18,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4037280.0, ans=0.125 2024-08-18 18:28:46,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2024-08-18 18:28:51,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4037480.0, ans=0.125 2024-08-18 18:28:59,088 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2650, loss[loss=0.08285, beats_loss=0.01323, ecapa_loss=0.0001606, whisper_loss=0.06801, over 21999.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001424, whisper_loss=0.09092, over 3877128.79 frames. ], batch size: 95, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:29:15,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.16 vs. limit=22.5 2024-08-18 18:29:21,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4037680.0, ans=0.0 2024-08-18 18:29:48,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4037780.0, ans=0.1 2024-08-18 18:29:51,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4037780.0, ans=0.0 2024-08-18 18:29:56,213 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-18 18:30:03,478 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 18:30:16,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4037980.0, ans=0.125 2024-08-18 18:30:17,900 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 18:30:21,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2024-08-18 18:30:30,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-08-18 18:30:35,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2700, loss[loss=0.1035, beats_loss=0.008002, ecapa_loss=0.0001536, whisper_loss=0.094, over 14773.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.09029, over 3883959.52 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:30:46,325 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 18:30:48,670 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.289e+01 2.510e+01 2.864e+01 4.358e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-18 18:31:07,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4038280.0, ans=0.125 2024-08-18 18:31:15,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4038280.0, ans=0.125 2024-08-18 18:31:21,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4038380.0, ans=0.125 2024-08-18 18:31:41,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4038480.0, ans=0.0 2024-08-18 18:31:42,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4038480.0, ans=0.125 2024-08-18 18:31:49,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2750, loss[loss=0.1038, beats_loss=0.01074, ecapa_loss=0.0001621, whisper_loss=0.09149, over 17451.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001417, whisper_loss=0.09061, over 3868633.39 frames. ], batch size: 70, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:31:49,300 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 21 from LS+wenet, 20 from Vox, 53 fro AS 2024-08-18 18:31:50,146 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2024-08-18 18:32:19,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4038780.0, ans=0.125 2024-08-18 18:32:21,725 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 18:32:29,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4038780.0, ans=0.0 2024-08-18 18:32:29,996 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 18:32:40,189 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-18 18:32:49,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4038980.0, ans=0.1 2024-08-18 18:32:50,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=15.0 2024-08-18 18:32:50,467 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-18 18:33:01,163 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2800, loss[loss=0.07721, beats_loss=0.01396, ecapa_loss=0.0001106, whisper_loss=0.06215, over 18516.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09127, over 3869609.14 frames. ], batch size: 70, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:33:01,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4039080.0, ans=0.125 2024-08-18 18:33:14,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.362e+01 2.601e+01 2.838e+01 4.412e+01, threshold=5.203e+01, percent-clipped=0.0 2024-08-18 18:33:23,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4039180.0, ans=0.125 2024-08-18 18:33:23,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.86 vs. limit=15.0 2024-08-18 18:33:24,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4039180.0, ans=0.1 2024-08-18 18:33:54,134 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.20 vs. limit=6.0 2024-08-18 18:34:00,970 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 18:34:03,644 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2024-08-18 18:34:10,051 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 18:34:14,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4039480.0, ans=0.07 2024-08-18 18:34:16,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4039480.0, ans=0.1 2024-08-18 18:34:17,730 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4039480.0, ans=0.5 2024-08-18 18:34:28,191 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2850, loss[loss=0.113, beats_loss=0.01011, ecapa_loss=0.0001404, whisper_loss=0.1014, over 19334.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.000141, whisper_loss=0.09072, over 3863995.31 frames. ], batch size: 76, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:34:34,178 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-18 18:34:44,697 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 11 from Vox, 43 fro AS 2024-08-18 18:35:00,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4039680.0, ans=0.0 2024-08-18 18:35:09,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4039780.0, ans=0.0 2024-08-18 18:35:11,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4039780.0, ans=0.125 2024-08-18 18:35:15,859 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-18 18:35:25,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4039880.0, ans=0.1 2024-08-18 18:35:52,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4039980.0, ans=0.125 2024-08-18 18:35:56,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4039980.0, ans=0.07 2024-08-18 18:36:05,231 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2900, loss[loss=0.1019, beats_loss=0.009548, ecapa_loss=0.0001396, whisper_loss=0.09095, over 15924.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.000141, whisper_loss=0.09111, over 3877067.78 frames. ], batch size: 64, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:36:08,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4040080.0, ans=0.2 2024-08-18 18:36:17,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4040080.0, ans=0.0 2024-08-18 18:36:21,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.803e+01 2.256e+01 2.587e+01 2.877e+01 4.773e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-18 18:36:31,392 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4040180.0, ans=0.125 2024-08-18 18:36:37,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4040180.0, ans=0.125 2024-08-18 18:36:48,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4040280.0, ans=0.5 2024-08-18 18:36:55,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.62 vs. limit=22.5 2024-08-18 18:37:20,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4040480.0, ans=0.1 2024-08-18 18:37:32,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 2950, loss[loss=0.1082, beats_loss=0.01111, ecapa_loss=0.0001441, whisper_loss=0.09561, over 22340.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01052, ecapa_loss=0.0001418, whisper_loss=0.09114, over 3917317.18 frames. ], batch size: 91, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:38:01,493 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 18:38:21,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4040780.0, ans=0.0 2024-08-18 18:38:50,948 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 12 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 18:38:53,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4040980.0, ans=0.2 2024-08-18 18:38:53,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4040980.0, ans=0.2 2024-08-18 18:39:11,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3000, loss[loss=0.08814, beats_loss=0.01032, ecapa_loss=0.0001678, whisper_loss=0.07614, over 14364.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001422, whisper_loss=0.09027, over 3945257.79 frames. ], batch size: 56, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:39:11,647 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 18:39:56,898 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on ASR_libri: loss=0.2533, beats_loss=0, ecapa_loss=0.0005204, whisper_loss=0.2481, over 922467.00 frames. 2024-08-18 18:40:14,589 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on SV_voxceleb1: loss=0.004036, beats_loss=0, ecapa_loss=0.0004036, whisper_loss=0, over 939242.00 frames. 2024-08-18 18:41:48,180 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 18:41:48,184 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 18:41:48,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4041080.0, ans=0.125 2024-08-18 18:41:51,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.06 vs. limit=5.0 2024-08-18 18:41:55,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.76 vs. limit=15.0 2024-08-18 18:41:58,247 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.951e+01 2.290e+01 2.609e+01 2.861e+01 5.437e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-18 18:42:24,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2024-08-18 18:42:33,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4041280.0, ans=0.2 2024-08-18 18:43:04,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4041380.0, ans=0.125 2024-08-18 18:43:04,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4041380.0, ans=0.125 2024-08-18 18:43:11,997 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 18:43:42,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3050, loss[loss=0.08917, beats_loss=0.01121, ecapa_loss=0.0001134, whisper_loss=0.07682, over 17526.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.0001425, whisper_loss=0.09028, over 3949922.04 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:43:47,910 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-18 18:43:49,569 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-18 18:44:19,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4041680.0, ans=0.125 2024-08-18 18:44:31,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4041680.0, ans=0.0 2024-08-18 18:44:45,409 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 18:44:54,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4041780.0, ans=0.125 2024-08-18 18:45:07,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4041880.0, ans=0.1 2024-08-18 18:45:13,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4041880.0, ans=0.125 2024-08-18 18:45:49,931 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3100, loss[loss=0.09891, beats_loss=0.01146, ecapa_loss=0.000109, whisper_loss=0.08636, over 19989.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01055, ecapa_loss=0.0001434, whisper_loss=0.0905, over 3939201.62 frames. ], batch size: 75, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:46:10,559 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.308e+01 2.550e+01 2.809e+01 3.973e+01, threshold=5.100e+01, percent-clipped=1.0 2024-08-18 18:46:16,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4042180.0, ans=0.125 2024-08-18 18:47:17,488 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 18:47:28,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4042480.0, ans=0.0 2024-08-18 18:47:40,847 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3150, loss[loss=0.101, beats_loss=0.01032, ecapa_loss=0.0001552, whisper_loss=0.08912, over 16247.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01063, ecapa_loss=0.0001425, whisper_loss=0.08973, over 3885491.33 frames. ], batch size: 68, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:47:41,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.41 vs. limit=10.0 2024-08-18 18:47:47,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4042580.0, ans=0.125 2024-08-18 18:47:48,854 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 14 from Vox, 30 fro AS 2024-08-18 18:47:54,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4042580.0, ans=0.125 2024-08-18 18:48:00,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4042680.0, ans=0.125 2024-08-18 18:48:05,226 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 18:48:24,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.17 vs. limit=8.0 2024-08-18 18:48:26,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4042780.0, ans=0.125 2024-08-18 18:48:41,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4042780.0, ans=0.125 2024-08-18 18:49:04,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4042980.0, ans=0.95 2024-08-18 18:49:16,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3200, loss[loss=0.09329, beats_loss=0.01322, ecapa_loss=0.0001544, whisper_loss=0.07853, over 21875.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01061, ecapa_loss=0.0001425, whisper_loss=0.09017, over 3887514.73 frames. ], batch size: 94, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:49:17,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.29 vs. limit=6.0 2024-08-18 18:49:20,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4043080.0, ans=0.2 2024-08-18 18:49:23,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4043080.0, ans=0.125 2024-08-18 18:49:23,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4043080.0, ans=0.125 2024-08-18 18:49:23,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.72 vs. limit=12.0 2024-08-18 18:49:29,054 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.766e+01 2.339e+01 2.552e+01 3.080e+01 4.481e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-18 18:49:29,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4043080.0, ans=0.0 2024-08-18 18:50:14,731 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-18 18:50:27,483 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-18 18:50:29,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4043480.0, ans=0.2 2024-08-18 18:50:34,878 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3250, loss[loss=0.1059, beats_loss=0.009399, ecapa_loss=0.0001585, whisper_loss=0.09491, over 18816.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.000143, whisper_loss=0.09036, over 3901534.02 frames. ], batch size: 78, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:50:38,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4043580.0, ans=0.0 2024-08-18 18:50:54,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4043680.0, ans=0.0 2024-08-18 18:50:59,652 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=22.5 2024-08-18 18:51:00,160 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 18:51:16,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.69 vs. limit=10.0 2024-08-18 18:51:37,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4043980.0, ans=0.125 2024-08-18 18:51:50,246 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3300, loss[loss=0.1239, beats_loss=0.01007, ecapa_loss=0.0001442, whisper_loss=0.1124, over 20183.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01062, ecapa_loss=0.0001432, whisper_loss=0.09062, over 3881307.44 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:51:58,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4044080.0, ans=0.125 2024-08-18 18:52:03,003 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.380e+01 2.621e+01 2.872e+01 4.395e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-18 18:52:07,638 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 18:52:10,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-08-18 18:52:19,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4044180.0, ans=0.0 2024-08-18 18:52:46,543 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 24 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 18:52:52,862 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 18:52:58,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4044480.0, ans=0.125 2024-08-18 18:53:11,114 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3350, loss[loss=0.09376, beats_loss=0.0112, ecapa_loss=0.0001261, whisper_loss=0.0813, over 19735.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.000143, whisper_loss=0.09041, over 3866043.74 frames. ], batch size: 77, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:53:21,163 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 18:53:26,141 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 30 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-18 18:53:30,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4044680.0, ans=0.125 2024-08-18 18:53:30,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4044680.0, ans=0.0 2024-08-18 18:53:44,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4044780.0, ans=0.07 2024-08-18 18:53:44,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4044780.0, ans=0.125 2024-08-18 18:53:48,329 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 18:54:28,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3400, loss[loss=0.0877, beats_loss=0.007827, ecapa_loss=0.0001458, whisper_loss=0.07842, over 15433.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001429, whisper_loss=0.09063, over 3850385.54 frames. ], batch size: 60, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:54:33,445 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 18:54:40,787 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.177e+01 2.415e+01 2.723e+01 4.499e+01, threshold=4.829e+01, percent-clipped=0.0 2024-08-18 18:54:40,916 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 18:54:44,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4045180.0, ans=0.125 2024-08-18 18:54:52,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4045180.0, ans=0.125 2024-08-18 18:55:12,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4045280.0, ans=0.125 2024-08-18 18:55:16,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4045380.0, ans=0.125 2024-08-18 18:55:50,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2024-08-18 18:55:51,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3450, loss[loss=0.106, beats_loss=0.01028, ecapa_loss=0.0001198, whisper_loss=0.09448, over 17196.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001441, whisper_loss=0.08998, over 3840006.53 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:55:52,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4045580.0, ans=0.125 2024-08-18 18:56:13,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=15.0 2024-08-18 18:56:18,284 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 18:56:18,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4045680.0, ans=10.0 2024-08-18 18:56:31,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4045780.0, ans=0.025 2024-08-18 18:56:39,914 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-18 18:56:53,519 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 18:56:55,229 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-18 18:57:11,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3500, loss[loss=0.08446, beats_loss=0.01323, ecapa_loss=0.000101, whisper_loss=0.07022, over 16560.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001436, whisper_loss=0.08989, over 3839152.57 frames. ], batch size: 63, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:57:16,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.48 vs. limit=15.0 2024-08-18 18:57:17,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4046080.0, ans=0.2 2024-08-18 18:57:23,076 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.329e+01 2.522e+01 2.820e+01 3.952e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-18 18:57:26,466 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 18:57:41,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4046280.0, ans=0.125 2024-08-18 18:57:53,661 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-18 18:57:58,974 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 18:58:04,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4046380.0, ans=0.125 2024-08-18 18:58:19,942 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 18:58:32,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3550, loss[loss=0.09932, beats_loss=0.01215, ecapa_loss=0.0001467, whisper_loss=0.08571, over 21607.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001434, whisper_loss=0.09017, over 3865423.00 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 18:58:34,808 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 18:58:44,806 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 18:58:58,300 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 13 from Vox, 34 fro AS 2024-08-18 18:59:04,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4046780.0, ans=0.09899494936611666 2024-08-18 18:59:09,434 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-18 18:59:12,747 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 18:59:27,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4046880.0, ans=0.125 2024-08-18 18:59:57,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3600, loss[loss=0.1048, beats_loss=0.01252, ecapa_loss=9.988e-05, whisper_loss=0.09129, over 20429.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.000142, whisper_loss=0.09028, over 3878842.14 frames. ], batch size: 79, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 19:00:08,870 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.825e+01 2.393e+01 2.591e+01 2.982e+01 4.231e+01, threshold=5.182e+01, percent-clipped=0.0 2024-08-18 19:00:18,560 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 19:00:39,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4047380.0, ans=0.125 2024-08-18 19:00:40,950 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-08-18 19:00:42,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4047380.0, ans=0.125 2024-08-18 19:00:52,884 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-18 19:00:57,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4047480.0, ans=0.125 2024-08-18 19:00:57,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4047480.0, ans=0.125 2024-08-18 19:01:01,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4047480.0, ans=0.0 2024-08-18 19:01:09,429 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.93 vs. limit=15.0 2024-08-18 19:01:11,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3650, loss[loss=0.102, beats_loss=0.0131, ecapa_loss=0.0001584, whisper_loss=0.08731, over 19799.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001417, whisper_loss=0.08994, over 3829896.51 frames. ], batch size: 82, lr: 2.20e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 19:01:18,042 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.93 vs. limit=12.0 2024-08-18 19:01:29,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4047680.0, ans=0.125 2024-08-18 19:01:32,107 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 25 from LS+wenet, 22 from Vox, 22 fro AS 2024-08-18 19:01:39,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4047680.0, ans=0.09899494936611666 2024-08-18 19:01:45,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4047780.0, ans=0.125 2024-08-18 19:01:47,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4047780.0, ans=0.0 2024-08-18 19:01:57,367 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-18 19:02:12,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4047880.0, ans=0.0 2024-08-18 19:02:17,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4047980.0, ans=15.0 2024-08-18 19:02:21,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.17 vs. limit=15.0 2024-08-18 19:02:25,693 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=2.813e+00 2024-08-18 19:02:28,910 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-18 19:02:35,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3700, loss[loss=0.1001, beats_loss=0.01042, ecapa_loss=0.0001252, whisper_loss=0.08848, over 18189.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001432, whisper_loss=0.09053, over 3867665.03 frames. ], batch size: 69, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:02:40,965 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 19:02:47,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.250e+01 2.400e+01 2.703e+01 3.510e+01, threshold=4.800e+01, percent-clipped=0.0 2024-08-18 19:02:58,648 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 19:03:05,214 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 19:03:25,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4048380.0, ans=0.125 2024-08-18 19:03:29,081 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.78 vs. limit=15.0 2024-08-18 19:03:51,687 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 19:03:53,226 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3750, loss[loss=0.0931, beats_loss=0.01222, ecapa_loss=0.0001415, whisper_loss=0.07946, over 14424.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001428, whisper_loss=0.09043, over 3874615.62 frames. ], batch size: 59, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:03:57,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4048580.0, ans=0.0 2024-08-18 19:04:08,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4048680.0, ans=0.0 2024-08-18 19:04:16,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4048680.0, ans=0.04949747468305833 2024-08-18 19:04:33,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.11 vs. limit=22.5 2024-08-18 19:04:48,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4048880.0, ans=0.0 2024-08-18 19:04:55,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4048880.0, ans=0.125 2024-08-18 19:05:16,862 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 19:05:18,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3800, loss[loss=0.1152, beats_loss=0.01007, ecapa_loss=0.0001791, whisper_loss=0.1033, over 19427.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001441, whisper_loss=0.09005, over 3875587.12 frames. ], batch size: 80, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:05:25,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4049080.0, ans=0.2 2024-08-18 19:05:31,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.387e+01 2.639e+01 2.992e+01 4.413e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-18 19:05:55,678 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-18 19:06:12,540 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 31 from LS+wenet, 19 from Vox, 15 fro AS 2024-08-18 19:06:16,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4049380.0, ans=0.05 2024-08-18 19:06:18,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4049380.0, ans=0.125 2024-08-18 19:06:20,310 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.26 vs. limit=6.0 2024-08-18 19:06:28,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4049480.0, ans=0.0 2024-08-18 19:06:39,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3850, loss[loss=0.1116, beats_loss=0.008768, ecapa_loss=0.0001535, whisper_loss=0.1012, over 22075.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001439, whisper_loss=0.09071, over 3843193.71 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:06:40,112 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 19:07:03,642 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 19:07:10,537 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 19:07:18,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4049880.0, ans=0.0 2024-08-18 19:07:25,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4049880.0, ans=0.035 2024-08-18 19:07:30,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4049880.0, ans=0.05 2024-08-18 19:07:37,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4049980.0, ans=0.125 2024-08-18 19:07:41,736 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 25 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 19:07:45,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3900, loss[loss=0.1219, beats_loss=0.009686, ecapa_loss=0.000114, whisper_loss=0.1111, over 16182.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001439, whisper_loss=0.09084, over 3874706.39 frames. ], batch size: 61, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:07:56,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.489e+01 2.786e+01 3.014e+01 3.884e+02, threshold=5.572e+01, percent-clipped=4.0 2024-08-18 19:08:00,436 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 19:08:01,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4050180.0, ans=0.0 2024-08-18 19:08:07,212 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 19:08:14,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-18 19:08:16,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4050280.0, ans=0.125 2024-08-18 19:08:17,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4050280.0, ans=0.1 2024-08-18 19:08:26,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4050380.0, ans=0.125 2024-08-18 19:08:30,991 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 19:08:31,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4050380.0, ans=0.2 2024-08-18 19:08:42,761 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 19:08:45,100 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 19:08:51,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 3950, loss[loss=0.1057, beats_loss=0.01077, ecapa_loss=0.0001432, whisper_loss=0.09347, over 22237.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.000145, whisper_loss=0.09086, over 3904447.62 frames. ], batch size: 88, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:08:55,406 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 19:08:55,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4050580.0, ans=0.125 2024-08-18 19:08:57,946 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 23 from Vox, 20 fro AS 2024-08-18 19:09:17,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4050780.0, ans=0.125 2024-08-18 19:09:26,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4050780.0, ans=0.125 2024-08-18 19:09:28,515 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-18 19:09:30,926 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 19:09:56,258 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4000, loss[loss=0.1135, beats_loss=0.009703, ecapa_loss=0.0001512, whisper_loss=0.1023, over 22021.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01043, ecapa_loss=0.0001448, whisper_loss=0.09104, over 3923293.77 frames. ], batch size: 90, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:10:06,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.265e+01 2.552e+01 2.868e+01 4.279e+01, threshold=5.105e+01, percent-clipped=0.0 2024-08-18 19:10:13,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4051180.0, ans=0.125 2024-08-18 19:10:14,766 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 19:10:18,836 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 19:10:22,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4051280.0, ans=0.1 2024-08-18 19:10:29,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4051280.0, ans=0.125 2024-08-18 19:10:29,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4051280.0, ans=0.0 2024-08-18 19:10:35,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.43 vs. limit=12.0 2024-08-18 19:10:42,720 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 28 from Vox, 17 fro AS 2024-08-18 19:10:50,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4051480.0, ans=0.1 2024-08-18 19:11:02,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4050, loss[loss=0.1029, beats_loss=0.008276, ecapa_loss=0.000215, whisper_loss=0.09243, over 16891.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001463, whisper_loss=0.09144, over 3908296.28 frames. ], batch size: 73, lr: 2.20e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:11:15,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4051680.0, ans=0.125 2024-08-18 19:11:20,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4051680.0, ans=0.04949747468305833 2024-08-18 19:11:37,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4051780.0, ans=0.125 2024-08-18 19:11:39,803 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 19:11:41,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4051880.0, ans=0.2 2024-08-18 19:11:45,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4051880.0, ans=0.0 2024-08-18 19:11:46,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2024-08-18 19:12:07,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4051980.0, ans=0.1 2024-08-18 19:12:09,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4100, loss[loss=0.1115, beats_loss=0.0101, ecapa_loss=0.0001729, whisper_loss=0.09964, over 21076.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01043, ecapa_loss=0.0001447, whisper_loss=0.0915, over 3886109.17 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:12:12,103 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 36 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 19:12:15,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4052080.0, ans=0.125 2024-08-18 19:12:19,683 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.351e+01 2.549e+01 2.874e+01 5.187e+01, threshold=5.098e+01, percent-clipped=1.0 2024-08-18 19:12:58,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4052380.0, ans=0.125 2024-08-18 19:13:03,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4052480.0, ans=0.09899494936611666 2024-08-18 19:13:15,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4150, loss[loss=0.1102, beats_loss=0.01115, ecapa_loss=0.0001458, whisper_loss=0.09755, over 23016.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.0104, ecapa_loss=0.0001456, whisper_loss=0.0916, over 3883064.12 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:13:24,251 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 19:13:28,475 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 19:13:36,487 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 19:13:37,843 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 19:13:38,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4052680.0, ans=0.125 2024-08-18 19:13:42,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.70 vs. limit=12.0 2024-08-18 19:13:47,583 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 19:13:50,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4052780.0, ans=0.0 2024-08-18 19:13:56,512 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 19:14:00,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4052880.0, ans=0.07 2024-08-18 19:14:03,216 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 19:14:21,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4200, loss[loss=0.09526, beats_loss=0.008281, ecapa_loss=0.0001634, whisper_loss=0.08535, over 19975.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01038, ecapa_loss=0.0001452, whisper_loss=0.09201, over 3903411.07 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:14:26,974 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-18 19:14:27,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4053080.0, ans=0.125 2024-08-18 19:14:32,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.283e+01 2.553e+01 2.911e+01 4.394e+01, threshold=5.106e+01, percent-clipped=0.0 2024-08-18 19:14:52,005 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 19 from Vox, 50 fro AS 2024-08-18 19:15:04,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4053380.0, ans=0.0 2024-08-18 19:15:27,043 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4250, loss[loss=0.1202, beats_loss=0.00665, ecapa_loss=0.000131, whisper_loss=0.1122, over 15300.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01043, ecapa_loss=0.0001452, whisper_loss=0.09128, over 3907941.60 frames. ], batch size: 58, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:15:29,467 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.77 vs. limit=6.0 2024-08-18 19:15:37,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4053580.0, ans=0.0 2024-08-18 19:15:59,548 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 19:16:06,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4053880.0, ans=0.125 2024-08-18 19:16:06,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4053880.0, ans=0.04949747468305833 2024-08-18 19:16:08,908 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-18 19:16:33,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4300, loss[loss=0.1013, beats_loss=0.01074, ecapa_loss=0.0001253, whisper_loss=0.0893, over 20328.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001453, whisper_loss=0.09131, over 3920641.81 frames. ], batch size: 79, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:16:44,516 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.278e+01 2.525e+01 2.871e+01 4.782e+01, threshold=5.050e+01, percent-clipped=0.0 2024-08-18 19:16:59,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4054280.0, ans=0.0 2024-08-18 19:17:03,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4054280.0, ans=0.2 2024-08-18 19:17:18,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4054380.0, ans=0.0 2024-08-18 19:17:20,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4054380.0, ans=0.1 2024-08-18 19:17:26,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4054480.0, ans=0.1 2024-08-18 19:17:31,635 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:17:33,785 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 19:17:36,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2024-08-18 19:17:40,319 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4350, loss[loss=0.1151, beats_loss=0.009485, ecapa_loss=0.0001647, whisper_loss=0.1039, over 22477.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001453, whisper_loss=0.09069, over 3886012.53 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:17:53,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.08 vs. limit=22.5 2024-08-18 19:17:58,221 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2024-08-18 19:17:59,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4054680.0, ans=0.125 2024-08-18 19:18:08,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4054780.0, ans=0.125 2024-08-18 19:18:13,822 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 28 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 19:18:14,773 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-18 19:18:20,035 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 19:18:25,978 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2024-08-18 19:18:31,871 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-18 19:18:37,000 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 19:18:45,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4400, loss[loss=0.1235, beats_loss=0.00925, ecapa_loss=0.0001668, whisper_loss=0.1126, over 22790.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001451, whisper_loss=0.09131, over 3890253.77 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:18:45,737 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 19:18:47,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4055080.0, ans=0.125 2024-08-18 19:18:55,955 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.724e+01 2.281e+01 2.472e+01 2.660e+01 4.951e+01, threshold=4.945e+01, percent-clipped=0.0 2024-08-18 19:18:56,498 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:18:56,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4055080.0, ans=0.0 2024-08-18 19:19:08,336 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 19:19:22,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4055280.0, ans=0.125 2024-08-18 19:19:24,135 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-18 19:19:46,517 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 19:19:52,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4450, loss[loss=0.1176, beats_loss=0.009216, ecapa_loss=0.0001415, whisper_loss=0.107, over 22829.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01038, ecapa_loss=0.000145, whisper_loss=0.09124, over 3876039.81 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:19:56,843 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4055580.0, ans=0.2 2024-08-18 19:19:58,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4055580.0, ans=0.125 2024-08-18 19:20:06,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4055680.0, ans=0.125 2024-08-18 19:20:18,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4055780.0, ans=0.125 2024-08-18 19:20:23,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4055780.0, ans=0.1 2024-08-18 19:20:36,589 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2024-08-18 19:20:44,369 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-18 19:20:56,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.95 vs. limit=15.0 2024-08-18 19:20:59,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4056080.0, ans=0.05 2024-08-18 19:20:59,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4500, loss[loss=0.1166, beats_loss=0.009567, ecapa_loss=0.0001657, whisper_loss=0.1054, over 23645.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001441, whisper_loss=0.0905, over 3866847.78 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:21:10,759 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.274e+01 2.537e+01 2.836e+01 4.716e+01, threshold=5.074e+01, percent-clipped=0.0 2024-08-18 19:21:16,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4056180.0, ans=0.09899494936611666 2024-08-18 19:21:39,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4056380.0, ans=0.125 2024-08-18 19:21:40,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4056380.0, ans=0.1 2024-08-18 19:22:02,102 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4056480.0, ans=0.125 2024-08-18 19:22:07,182 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4550, loss[loss=0.1175, beats_loss=0.01084, ecapa_loss=0.000129, whisper_loss=0.1054, over 21788.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01035, ecapa_loss=0.000145, whisper_loss=0.09131, over 3881704.99 frames. ], batch size: 86, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:22:09,360 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-08-18 19:22:11,406 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 19:22:26,282 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 19:22:40,810 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-18 19:22:42,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4056780.0, ans=0.0 2024-08-18 19:23:02,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4056980.0, ans=0.0 2024-08-18 19:23:06,258 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 19:23:14,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4600, loss[loss=0.08578, beats_loss=0.01067, ecapa_loss=0.0001523, whisper_loss=0.07358, over 22687.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01043, ecapa_loss=0.0001445, whisper_loss=0.09022, over 3886112.25 frames. ], batch size: 94, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:23:15,610 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 19:23:18,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4057080.0, ans=0.125 2024-08-18 19:23:25,097 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.957e+01 2.308e+01 2.504e+01 2.960e+01 4.674e+01, threshold=5.007e+01, percent-clipped=0.0 2024-08-18 19:23:32,923 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 19:23:51,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4057280.0, ans=0.1 2024-08-18 19:23:55,888 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.789e+01 2024-08-18 19:24:05,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4057380.0, ans=0.125 2024-08-18 19:24:07,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4057480.0, ans=0.0 2024-08-18 19:24:09,945 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-18 19:24:12,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4057480.0, ans=0.125 2024-08-18 19:24:20,248 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4650, loss[loss=0.09877, beats_loss=0.01041, ecapa_loss=0.0001403, whisper_loss=0.08696, over 23031.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.000145, whisper_loss=0.09017, over 3895447.25 frames. ], batch size: 94, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:24:20,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4057580.0, ans=0.125 2024-08-18 19:24:24,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4057580.0, ans=0.1 2024-08-18 19:24:35,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4057680.0, ans=0.125 2024-08-18 19:24:36,271 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 19:24:36,913 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-18 19:24:39,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4057680.0, ans=0.0 2024-08-18 19:25:23,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4057980.0, ans=0.0 2024-08-18 19:25:26,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4700, loss[loss=0.1062, beats_loss=0.009098, ecapa_loss=0.0001676, whisper_loss=0.09543, over 22790.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001446, whisper_loss=0.09042, over 3877726.46 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:25:36,745 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.335e+01 2.621e+01 2.898e+01 4.887e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-18 19:25:40,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4058180.0, ans=22.5 2024-08-18 19:25:45,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4058180.0, ans=0.2 2024-08-18 19:25:55,622 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 19:26:02,498 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 19:26:17,873 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 20 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-18 19:26:24,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4058480.0, ans=0.125 2024-08-18 19:26:28,461 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-18 19:26:30,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.81 vs. limit=15.0 2024-08-18 19:26:32,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4750, loss[loss=0.1053, beats_loss=0.008853, ecapa_loss=0.000174, whisper_loss=0.09475, over 18957.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001443, whisper_loss=0.08969, over 3864370.87 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:26:57,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4058680.0, ans=0.125 2024-08-18 19:26:59,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4058780.0, ans=15.0 2024-08-18 19:27:16,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4058880.0, ans=0.2 2024-08-18 19:27:20,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4058880.0, ans=0.1 2024-08-18 19:27:24,617 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-18 19:27:38,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4800, loss[loss=0.08704, beats_loss=0.01154, ecapa_loss=0.0001493, whisper_loss=0.07401, over 20546.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001444, whisper_loss=0.08972, over 3844957.01 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:27:49,123 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.300e+01 2.541e+01 2.799e+01 4.808e+02, threshold=5.082e+01, percent-clipped=2.0 2024-08-18 19:27:53,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4059180.0, ans=0.05 2024-08-18 19:27:57,835 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:28:02,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=12.0 2024-08-18 19:28:13,722 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 19:28:38,674 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 32 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 19:28:38,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4059480.0, ans=0.0 2024-08-18 19:28:39,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2024-08-18 19:28:40,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4059480.0, ans=0.1 2024-08-18 19:28:45,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4850, loss[loss=0.093, beats_loss=0.01081, ecapa_loss=0.0001166, whisper_loss=0.08103, over 14405.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.000144, whisper_loss=0.09092, over 3874910.18 frames. ], batch size: 56, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:28:58,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4059680.0, ans=0.0 2024-08-18 19:29:38,085 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 19:29:46,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4059980.0, ans=0.125 2024-08-18 19:29:50,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4900, loss[loss=0.08872, beats_loss=0.01109, ecapa_loss=0.0001368, whisper_loss=0.07626, over 21204.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001446, whisper_loss=0.09075, over 3846715.88 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:29:58,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4060080.0, ans=0.125 2024-08-18 19:30:01,412 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.212e+01 2.467e+01 2.752e+01 9.926e+01, threshold=4.934e+01, percent-clipped=3.0 2024-08-18 19:30:09,019 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2024-08-18 19:30:24,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4060280.0, ans=0.09899494936611666 2024-08-18 19:30:28,024 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 19:30:29,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4060380.0, ans=0.07 2024-08-18 19:30:32,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4060380.0, ans=0.125 2024-08-18 19:30:49,690 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 19:30:57,402 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 4950, loss[loss=0.1065, beats_loss=0.01002, ecapa_loss=0.0001444, whisper_loss=0.09503, over 22027.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01029, ecapa_loss=0.0001446, whisper_loss=0.09154, over 3845115.43 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:31:32,043 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 19:31:41,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4060880.0, ans=0.0 2024-08-18 19:31:43,860 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 19 from LS+wenet, 28 from Vox, 47 fro AS 2024-08-18 19:31:48,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4060880.0, ans=0.0 2024-08-18 19:32:03,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-18 19:32:03,829 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5000, loss[loss=0.1048, beats_loss=0.009776, ecapa_loss=0.0001478, whisper_loss=0.09354, over 21873.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001435, whisper_loss=0.09082, over 3873859.96 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:32:14,148 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.344e+01 2.610e+01 2.936e+01 3.838e+01, threshold=5.220e+01, percent-clipped=0.0 2024-08-18 19:32:14,482 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 19:32:22,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-18 19:32:27,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4061180.0, ans=0.07 2024-08-18 19:32:37,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=15.0 2024-08-18 19:32:41,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4061280.0, ans=0.0 2024-08-18 19:32:42,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4061380.0, ans=0.1 2024-08-18 19:32:43,545 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 19:32:52,865 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 30 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 19:32:53,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4061380.0, ans=0.07 2024-08-18 19:32:57,658 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-18 19:33:09,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5050, loss[loss=0.09437, beats_loss=0.01173, ecapa_loss=0.0001478, whisper_loss=0.08116, over 15226.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001429, whisper_loss=0.09049, over 3893757.81 frames. ], batch size: 63, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:33:11,785 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 19:33:19,793 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-18 19:33:21,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4061680.0, ans=0.0 2024-08-18 19:33:30,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.49 vs. limit=10.0 2024-08-18 19:33:31,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4061680.0, ans=0.2 2024-08-18 19:33:34,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4061780.0, ans=0.2 2024-08-18 19:33:57,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4061880.0, ans=0.125 2024-08-18 19:34:13,215 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-18 19:34:14,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5100, loss[loss=0.1101, beats_loss=0.008872, ecapa_loss=0.0001403, whisper_loss=0.09982, over 21510.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01044, ecapa_loss=0.0001437, whisper_loss=0.09143, over 3872798.09 frames. ], batch size: 82, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:34:23,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4062080.0, ans=0.125 2024-08-18 19:34:23,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4062080.0, ans=0.5 2024-08-18 19:34:24,692 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.303e+01 2.611e+01 2.910e+01 2.012e+02, threshold=5.222e+01, percent-clipped=3.0 2024-08-18 19:34:25,253 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 19:34:39,363 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 19:34:46,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4062280.0, ans=0.125 2024-08-18 19:35:09,334 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 19:35:10,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4062480.0, ans=0.0 2024-08-18 19:35:12,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-08-18 19:35:14,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4062480.0, ans=0.07 2024-08-18 19:35:15,712 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 21 from Vox, 49 fro AS 2024-08-18 19:35:19,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5150, loss[loss=0.09537, beats_loss=0.01212, ecapa_loss=0.0001288, whisper_loss=0.08196, over 22857.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.000143, whisper_loss=0.09151, over 3889764.22 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:35:43,004 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 19:35:57,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4062880.0, ans=0.0 2024-08-18 19:36:07,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4062880.0, ans=0.2 2024-08-18 19:36:24,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5200, loss[loss=0.08915, beats_loss=0.01017, ecapa_loss=0.0001284, whisper_loss=0.07769, over 20077.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001429, whisper_loss=0.09096, over 3880713.46 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:36:25,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4063080.0, ans=0.1 2024-08-18 19:36:27,188 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 19:36:27,479 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=6.228e+00 2024-08-18 19:36:28,537 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 14 from Vox, 50 fro AS 2024-08-18 19:36:28,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4063080.0, ans=0.125 2024-08-18 19:36:34,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4063080.0, ans=0.0 2024-08-18 19:36:34,932 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.237e+01 2.499e+01 2.869e+01 3.918e+01, threshold=4.998e+01, percent-clipped=0.0 2024-08-18 19:36:38,868 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 19:37:00,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4063280.0, ans=0.125 2024-08-18 19:37:16,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4063480.0, ans=0.2 2024-08-18 19:37:17,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4063480.0, ans=0.125 2024-08-18 19:37:25,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4063480.0, ans=0.2 2024-08-18 19:37:27,229 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2024-08-18 19:37:29,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5250, loss[loss=0.1062, beats_loss=0.009115, ecapa_loss=0.0001535, whisper_loss=0.09558, over 20872.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001428, whisper_loss=0.09086, over 3890859.23 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:37:35,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4063580.0, ans=0.0 2024-08-18 19:37:44,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4063680.0, ans=0.0 2024-08-18 19:37:52,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4063680.0, ans=0.025 2024-08-18 19:37:55,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-08-18 19:37:59,896 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 19:38:09,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4063880.0, ans=0.0 2024-08-18 19:38:18,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4063880.0, ans=0.125 2024-08-18 19:38:34,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5300, loss[loss=0.08945, beats_loss=0.01191, ecapa_loss=0.0001142, whisper_loss=0.0764, over 16730.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.091, over 3883287.63 frames. ], batch size: 65, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:38:44,199 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 19:38:45,256 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.271e+01 2.459e+01 2.862e+01 3.681e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-18 19:39:05,355 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-18 19:39:13,006 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-18 19:39:19,402 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 19:39:40,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5350, loss[loss=0.09743, beats_loss=0.01059, ecapa_loss=0.0001369, whisper_loss=0.08547, over 20941.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.000143, whisper_loss=0.09076, over 3861325.11 frames. ], batch size: 83, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:39:44,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4064580.0, ans=0.1 2024-08-18 19:40:00,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4064680.0, ans=0.125 2024-08-18 19:40:22,280 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 19:40:25,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.33 vs. limit=15.0 2024-08-18 19:40:33,730 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 19:40:45,349 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5400, loss[loss=0.1074, beats_loss=0.00988, ecapa_loss=0.0001508, whisper_loss=0.096, over 21071.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01039, ecapa_loss=0.0001426, whisper_loss=0.09094, over 3876385.33 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:40:50,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4065080.0, ans=0.125 2024-08-18 19:40:55,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.323e+01 2.486e+01 2.757e+01 7.615e+01, threshold=4.971e+01, percent-clipped=1.0 2024-08-18 19:40:58,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4065180.0, ans=0.125 2024-08-18 19:41:06,020 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 19:41:06,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4065180.0, ans=0.125 2024-08-18 19:41:10,804 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.767e+05 2024-08-18 19:41:12,812 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 19:41:13,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4065280.0, ans=0.125 2024-08-18 19:41:33,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4065380.0, ans=0.125 2024-08-18 19:41:41,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4065480.0, ans=0.2 2024-08-18 19:41:50,149 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5450, loss[loss=0.1082, beats_loss=0.01134, ecapa_loss=0.0001444, whisper_loss=0.09541, over 19346.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.000142, whisper_loss=0.09058, over 3867897.92 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:41:58,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-18 19:41:59,175 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 19:42:06,927 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 19:42:11,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4065680.0, ans=0.2 2024-08-18 19:42:14,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4065780.0, ans=0.0 2024-08-18 19:42:29,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4065880.0, ans=0.125 2024-08-18 19:42:43,003 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-18 19:42:54,890 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5500, loss[loss=0.1112, beats_loss=0.008378, ecapa_loss=0.0001307, whisper_loss=0.1015, over 17998.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.000143, whisper_loss=0.09019, over 3873406.53 frames. ], batch size: 65, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:43:03,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4066080.0, ans=0.125 2024-08-18 19:43:05,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.744e+01 2.282e+01 2.535e+01 2.838e+01 1.372e+02, threshold=5.070e+01, percent-clipped=2.0 2024-08-18 19:43:10,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-18 19:43:13,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4066180.0, ans=0.95 2024-08-18 19:43:33,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4066280.0, ans=0.0 2024-08-18 19:43:43,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4066380.0, ans=0.125 2024-08-18 19:43:46,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4066380.0, ans=0.1 2024-08-18 19:43:48,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.85 vs. limit=15.0 2024-08-18 19:43:59,729 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-18 19:44:02,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5550, loss[loss=0.08957, beats_loss=0.01286, ecapa_loss=0.0001269, whisper_loss=0.07544, over 23544.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01053, ecapa_loss=0.0001432, whisper_loss=0.0903, over 3877674.91 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:44:04,980 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 19:44:10,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4066580.0, ans=0.125 2024-08-18 19:44:11,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4066580.0, ans=0.1 2024-08-18 19:44:13,771 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-18 19:44:21,608 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-18 19:44:25,469 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 19:44:46,059 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-18 19:44:47,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4066880.0, ans=0.125 2024-08-18 19:45:00,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.65 vs. limit=10.0 2024-08-18 19:45:05,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4066980.0, ans=0.125 2024-08-18 19:45:14,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5600, loss[loss=0.07812, beats_loss=0.01407, ecapa_loss=0.0001196, whisper_loss=0.06286, over 22168.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001435, whisper_loss=0.09006, over 3893539.95 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:45:19,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4067080.0, ans=0.125 2024-08-18 19:45:25,979 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.465e+01 2.692e+01 2.981e+01 3.503e+02, threshold=5.385e+01, percent-clipped=2.0 2024-08-18 19:45:29,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4067180.0, ans=0.125 2024-08-18 19:45:41,026 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 19:45:51,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4067280.0, ans=0.1 2024-08-18 19:45:53,089 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 24 from Vox, 16 fro AS 2024-08-18 19:46:02,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4067380.0, ans=0.125 2024-08-18 19:46:27,896 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-18 19:46:28,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4067580.0, ans=0.125 2024-08-18 19:46:28,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5650, loss[loss=0.1165, beats_loss=0.01105, ecapa_loss=0.0001122, whisper_loss=0.1043, over 23988.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001436, whisper_loss=0.09, over 3914859.54 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:46:34,459 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-18 19:46:37,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4067580.0, ans=0.0 2024-08-18 19:46:44,033 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-18 19:46:49,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4067680.0, ans=0.2 2024-08-18 19:47:00,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4067780.0, ans=0.0 2024-08-18 19:47:04,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4067780.0, ans=0.0 2024-08-18 19:47:08,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4067780.0, ans=0.2 2024-08-18 19:47:25,178 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 19:47:44,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4068080.0, ans=0.125 2024-08-18 19:47:45,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5700, loss[loss=0.1313, beats_loss=0.008378, ecapa_loss=0.0001519, whisper_loss=0.1214, over 14154.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01057, ecapa_loss=0.0001439, whisper_loss=0.08991, over 3909301.21 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 1.152921504606847e+18 2024-08-18 19:47:51,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.76 vs. limit=22.5 2024-08-18 19:47:55,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4068080.0, ans=0.1 2024-08-18 19:47:58,013 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.343e+01 2.551e+01 2.885e+01 3.907e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-18 19:48:00,449 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 19:48:07,380 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 19:48:10,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4068180.0, ans=0.0 2024-08-18 19:48:14,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4068180.0, ans=0.2 2024-08-18 19:48:25,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.20 vs. limit=22.5 2024-08-18 19:48:44,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=15.0 2024-08-18 19:48:51,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-18 19:48:56,467 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 14 from Vox, 28 fro AS 2024-08-18 19:48:58,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4068480.0, ans=0.0 2024-08-18 19:49:00,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5750, loss[loss=0.09177, beats_loss=0.009672, ecapa_loss=0.0001224, whisper_loss=0.08088, over 19061.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.000144, whisper_loss=0.09038, over 3903089.74 frames. ], batch size: 72, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:49:03,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4068580.0, ans=0.1 2024-08-18 19:49:03,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-08-18 19:49:20,812 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 19:49:24,353 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 19:49:34,341 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 19:50:06,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4068980.0, ans=0.5 2024-08-18 19:50:22,529 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5800, loss[loss=0.09817, beats_loss=0.01049, ecapa_loss=0.0001866, whisper_loss=0.08581, over 17428.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001448, whisper_loss=0.09017, over 3891001.26 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:50:32,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4069080.0, ans=0.125 2024-08-18 19:50:33,784 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4069080.0, ans=0.0 2024-08-18 19:50:34,121 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-08-18 19:50:35,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.304e+01 2.520e+01 2.839e+01 4.509e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 19:50:41,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4069180.0, ans=0.125 2024-08-18 19:51:07,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4069380.0, ans=0.125 2024-08-18 19:51:15,242 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-18 19:51:33,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4069480.0, ans=0.025 2024-08-18 19:51:37,336 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5850, loss[loss=0.09059, beats_loss=0.01162, ecapa_loss=0.0001455, whisper_loss=0.07751, over 18359.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001455, whisper_loss=0.09018, over 3883977.83 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:51:47,232 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 19:51:58,940 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-18 19:52:11,471 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 19:52:13,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4069780.0, ans=0.125 2024-08-18 19:52:19,320 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 19:52:30,459 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-18 19:52:45,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=15.0 2024-08-18 19:52:49,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4069980.0, ans=0.125 2024-08-18 19:52:51,747 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5900, loss[loss=0.08293, beats_loss=0.01041, ecapa_loss=0.0001487, whisper_loss=0.07103, over 14285.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001446, whisper_loss=0.08952, over 3842050.22 frames. ], batch size: 59, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:53:03,422 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.303e+01 2.495e+01 2.775e+01 3.811e+01, threshold=4.989e+01, percent-clipped=0.0 2024-08-18 19:53:18,070 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.59 vs. limit=15.0 2024-08-18 19:53:29,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=22.5 2024-08-18 19:53:58,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 5950, loss[loss=0.09436, beats_loss=0.01053, ecapa_loss=0.0001345, whisper_loss=0.08249, over 21781.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0106, ecapa_loss=0.0001443, whisper_loss=0.08948, over 3842010.10 frames. ], batch size: 87, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:53:59,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.28 vs. limit=22.5 2024-08-18 19:54:24,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.39 vs. limit=22.5 2024-08-18 19:54:30,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4070780.0, ans=0.125 2024-08-18 19:54:47,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-08-18 19:55:04,512 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6000, loss[loss=0.07914, beats_loss=0.01452, ecapa_loss=0.0001512, whisper_loss=0.06311, over 16605.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001449, whisper_loss=0.09053, over 3843165.83 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:55:04,513 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 19:55:42,611 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on ASR_libri: loss=0.2546, beats_loss=0, ecapa_loss=0.0005279, whisper_loss=0.2493, over 922467.00 frames. 2024-08-18 19:55:59,690 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on SV_voxceleb1: loss=0.003985, beats_loss=0, ecapa_loss=0.0003985, whisper_loss=0, over 939242.00 frames. 2024-08-18 19:57:44,525 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 19:57:44,538 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 19:57:51,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4071080.0, ans=0.1 2024-08-18 19:57:52,398 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 19:57:54,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4071080.0, ans=0.125 2024-08-18 19:57:56,135 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.336e+01 2.599e+01 2.933e+01 4.741e+01, threshold=5.199e+01, percent-clipped=0.0 2024-08-18 19:58:01,661 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-18 19:58:04,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4071180.0, ans=0.1 2024-08-18 19:58:16,601 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 19:58:16,896 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.945e-01 2024-08-18 19:58:17,755 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 19:58:20,165 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 19:58:32,418 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 19:58:40,222 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 19:58:47,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4071480.0, ans=0.1 2024-08-18 19:58:48,531 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-18 19:58:52,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6050, loss[loss=0.1095, beats_loss=0.01121, ecapa_loss=0.000128, whisper_loss=0.09706, over 17593.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001445, whisper_loss=0.09069, over 3838769.46 frames. ], batch size: 69, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 19:58:56,586 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-18 19:58:57,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4071580.0, ans=0.0 2024-08-18 19:59:06,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4071680.0, ans=0.05 2024-08-18 19:59:07,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4071680.0, ans=0.125 2024-08-18 19:59:10,081 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 19:59:17,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4071680.0, ans=0.0 2024-08-18 19:59:18,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4071780.0, ans=0.125 2024-08-18 19:59:36,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=4071880.0, ans=15.0 2024-08-18 19:59:38,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4071880.0, ans=0.0 2024-08-18 19:59:56,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4071980.0, ans=0.125 2024-08-18 19:59:59,692 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6100, loss[loss=0.1054, beats_loss=0.01055, ecapa_loss=0.0001221, whisper_loss=0.0936, over 21074.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01054, ecapa_loss=0.0001442, whisper_loss=0.08999, over 3856939.03 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:00:10,578 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4072080.0, ans=0.125 2024-08-18 20:00:12,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.299e+01 2.594e+01 2.933e+01 3.314e+02, threshold=5.188e+01, percent-clipped=1.0 2024-08-18 20:00:17,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4072180.0, ans=0.125 2024-08-18 20:00:23,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4072180.0, ans=0.125 2024-08-18 20:00:28,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4072280.0, ans=0.125 2024-08-18 20:00:34,553 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-18 20:00:43,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-08-18 20:00:46,556 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.783e+00 2024-08-18 20:00:52,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4072480.0, ans=0.0 2024-08-18 20:01:01,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4072480.0, ans=0.125 2024-08-18 20:01:05,027 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 20:01:06,152 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6150, loss[loss=0.1041, beats_loss=0.01047, ecapa_loss=0.0001251, whisper_loss=0.09241, over 20659.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001446, whisper_loss=0.08989, over 3867501.00 frames. ], batch size: 81, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:01:12,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4072580.0, ans=0.125 2024-08-18 20:01:16,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-18 20:01:19,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4072680.0, ans=0.125 2024-08-18 20:01:20,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4072680.0, ans=0.125 2024-08-18 20:01:59,284 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 20:02:03,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=4072980.0, ans=22.5 2024-08-18 20:02:13,427 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6200, loss[loss=0.1037, beats_loss=0.009381, ecapa_loss=0.0001656, whisper_loss=0.09268, over 19885.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001443, whisper_loss=0.09, over 3882445.01 frames. ], batch size: 78, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:02:14,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4073080.0, ans=0.0 2024-08-18 20:02:17,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4073080.0, ans=0.2 2024-08-18 20:02:22,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-08-18 20:02:26,337 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.303e+01 2.554e+01 2.870e+01 1.661e+02, threshold=5.109e+01, percent-clipped=2.0 2024-08-18 20:02:32,415 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 20:02:36,561 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 27 from Vox, 43 fro AS 2024-08-18 20:02:39,565 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 20:02:45,639 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 20:02:49,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4073280.0, ans=10.0 2024-08-18 20:02:51,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4073280.0, ans=0.125 2024-08-18 20:02:53,234 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-18 20:02:55,782 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-18 20:02:59,198 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 20:03:08,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2024-08-18 20:03:09,618 INFO [train_multi_KD3.py:844] (3/4) A total of 98 cuts. 31 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-18 20:03:09,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4073480.0, ans=0.0 2024-08-18 20:03:11,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.77 vs. limit=22.5 2024-08-18 20:03:21,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=15.0 2024-08-18 20:03:23,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4073580.0, ans=0.125 2024-08-18 20:03:24,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6250, loss[loss=0.101, beats_loss=0.01137, ecapa_loss=0.0001757, whisper_loss=0.08787, over 21809.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.09019, over 3897899.71 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:03:34,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4073580.0, ans=0.125 2024-08-18 20:03:51,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4073780.0, ans=0.1 2024-08-18 20:03:55,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4073780.0, ans=0.015 2024-08-18 20:04:02,817 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4073780.0, ans=0.125 2024-08-18 20:04:14,676 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:04:36,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6300, loss[loss=0.09639, beats_loss=0.01178, ecapa_loss=0.0001116, whisper_loss=0.08349, over 19002.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01045, ecapa_loss=0.0001452, whisper_loss=0.09091, over 3903030.71 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:04:49,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.375e+01 2.574e+01 2.890e+01 4.000e+02, threshold=5.149e+01, percent-clipped=1.0 2024-08-18 20:05:02,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4074180.0, ans=0.1 2024-08-18 20:05:09,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2024-08-18 20:05:26,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4074380.0, ans=0.0 2024-08-18 20:05:28,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4074380.0, ans=0.2 2024-08-18 20:05:29,587 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-18 20:05:45,079 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6350, loss[loss=0.09617, beats_loss=0.009492, ecapa_loss=0.0001387, whisper_loss=0.08529, over 17935.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01055, ecapa_loss=0.0001443, whisper_loss=0.09032, over 3888292.04 frames. ], batch size: 73, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:05:47,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4074580.0, ans=0.0 2024-08-18 20:05:49,805 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-18 20:05:50,907 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-18 20:06:24,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4074880.0, ans=0.1 2024-08-18 20:06:27,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4074880.0, ans=0.125 2024-08-18 20:06:35,455 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 7 from Vox, 32 fro AS 2024-08-18 20:06:39,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4074980.0, ans=0.0 2024-08-18 20:06:40,624 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 16 from LS+wenet, 30 from Vox, 42 fro AS 2024-08-18 20:06:40,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4074980.0, ans=0.125 2024-08-18 20:06:50,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6400, loss[loss=0.109, beats_loss=0.008132, ecapa_loss=0.0001544, whisper_loss=0.09932, over 19381.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.0106, ecapa_loss=0.0001446, whisper_loss=0.08955, over 3920960.83 frames. ], batch size: 80, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:07:02,019 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.365e+01 2.555e+01 2.895e+01 7.791e+01, threshold=5.110e+01, percent-clipped=1.0 2024-08-18 20:07:19,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4075280.0, ans=0.0 2024-08-18 20:07:20,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4075280.0, ans=0.125 2024-08-18 20:07:24,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4075280.0, ans=0.0 2024-08-18 20:07:25,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4075280.0, ans=0.125 2024-08-18 20:07:36,390 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 20:07:53,614 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.12 vs. limit=15.0 2024-08-18 20:07:54,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6450, loss[loss=0.1049, beats_loss=0.01255, ecapa_loss=0.0001266, whisper_loss=0.09104, over 23130.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01059, ecapa_loss=0.0001441, whisper_loss=0.08997, over 3927080.69 frames. ], batch size: 93, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:07:54,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4075580.0, ans=0.125 2024-08-18 20:08:07,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4075680.0, ans=0.2 2024-08-18 20:08:08,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4075680.0, ans=0.125 2024-08-18 20:08:09,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4075680.0, ans=0.2 2024-08-18 20:08:14,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4075680.0, ans=0.125 2024-08-18 20:08:17,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4075680.0, ans=0.0 2024-08-18 20:08:20,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4075780.0, ans=0.125 2024-08-18 20:08:27,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-18 20:08:37,197 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 20:08:44,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4075980.0, ans=0.0 2024-08-18 20:08:56,322 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 12 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 20:08:57,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6500, loss[loss=0.06719, beats_loss=0.01099, ecapa_loss=0.0001222, whisper_loss=0.05498, over 14680.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01061, ecapa_loss=0.0001433, whisper_loss=0.08978, over 3915510.23 frames. ], batch size: 57, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:09:08,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.695e+01 2.274e+01 2.478e+01 2.663e+01 4.004e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 20:09:08,981 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 20:09:14,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4076180.0, ans=0.2 2024-08-18 20:09:23,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4076280.0, ans=0.0 2024-08-18 20:09:27,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4076280.0, ans=0.0 2024-08-18 20:09:31,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4076280.0, ans=0.125 2024-08-18 20:09:38,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4076380.0, ans=0.0 2024-08-18 20:09:39,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4076380.0, ans=0.0 2024-08-18 20:09:41,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.27 vs. limit=15.0 2024-08-18 20:10:01,927 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6550, loss[loss=0.105, beats_loss=0.009407, ecapa_loss=0.0001392, whisper_loss=0.09422, over 19075.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01056, ecapa_loss=0.0001441, whisper_loss=0.09052, over 3911973.35 frames. ], batch size: 75, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:10:04,355 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-18 20:10:25,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4076680.0, ans=0.125 2024-08-18 20:10:27,812 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-18 20:10:29,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4076780.0, ans=0.2 2024-08-18 20:10:38,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-18 20:10:40,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4076880.0, ans=0.07 2024-08-18 20:10:44,614 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 18 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 20:10:47,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4076880.0, ans=0.125 2024-08-18 20:10:51,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4076880.0, ans=0.125 2024-08-18 20:10:52,719 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4076980.0, ans=0.1 2024-08-18 20:10:54,879 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 20:10:57,629 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-18 20:10:58,939 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-18 20:11:00,311 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4076980.0, ans=0.125 2024-08-18 20:11:06,264 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6600, loss[loss=0.1227, beats_loss=0.009458, ecapa_loss=0.0001681, whisper_loss=0.1115, over 15615.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001451, whisper_loss=0.0916, over 3938711.48 frames. ], batch size: 63, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:11:11,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4077080.0, ans=0.125 2024-08-18 20:11:17,598 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.939e+01 2.459e+01 2.687e+01 3.202e+01 5.546e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-18 20:11:22,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4077180.0, ans=0.1 2024-08-18 20:11:27,832 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 20:11:30,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4077280.0, ans=0.125 2024-08-18 20:11:37,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.86 vs. limit=15.0 2024-08-18 20:11:37,961 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-18 20:11:43,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4077380.0, ans=0.0 2024-08-18 20:12:01,162 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-18 20:12:10,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6650, loss[loss=0.09832, beats_loss=0.0107, ecapa_loss=0.000123, whisper_loss=0.08639, over 23518.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001453, whisper_loss=0.09101, over 3930702.98 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:12:15,364 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 39 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-18 20:12:17,790 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 20:12:18,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4077580.0, ans=0.2 2024-08-18 20:12:18,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4077580.0, ans=0.2 2024-08-18 20:12:27,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4077680.0, ans=0.0 2024-08-18 20:12:30,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2024-08-18 20:12:48,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4077880.0, ans=0.125 2024-08-18 20:13:09,352 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-18 20:13:14,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6700, loss[loss=0.1108, beats_loss=0.01016, ecapa_loss=0.0001467, whisper_loss=0.09916, over 22488.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001445, whisper_loss=0.09168, over 3938749.82 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:13:18,217 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 20:13:19,514 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 20:13:23,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4078080.0, ans=0.125 2024-08-18 20:13:26,210 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.353e+01 2.592e+01 3.066e+01 1.135e+02, threshold=5.185e+01, percent-clipped=5.0 2024-08-18 20:13:35,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4078180.0, ans=0.125 2024-08-18 20:13:37,747 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 20:13:45,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4078280.0, ans=0.2 2024-08-18 20:13:46,043 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2024-08-18 20:13:57,106 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 20:13:58,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4078380.0, ans=0.0 2024-08-18 20:14:03,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4078380.0, ans=0.125 2024-08-18 20:14:07,568 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 20:14:11,597 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-18 20:14:12,722 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 20:14:13,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-08-18 20:14:14,204 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 20:14:16,744 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-18 20:14:19,046 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6750, loss[loss=0.1201, beats_loss=0.009679, ecapa_loss=0.0001186, whisper_loss=0.1093, over 18571.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01045, ecapa_loss=0.0001449, whisper_loss=0.09064, over 3902008.46 frames. ], batch size: 67, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:14:27,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4078580.0, ans=0.125 2024-08-18 20:14:31,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4078680.0, ans=0.0 2024-08-18 20:14:38,910 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 20:15:01,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4078880.0, ans=0.125 2024-08-18 20:15:01,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4078880.0, ans=0.125 2024-08-18 20:15:01,901 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 20:15:02,263 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.084e+00 2024-08-18 20:15:06,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4078880.0, ans=0.1 2024-08-18 20:15:24,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6800, loss[loss=0.1132, beats_loss=0.008807, ecapa_loss=0.0001824, whisper_loss=0.1025, over 22531.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001436, whisper_loss=0.09091, over 3925718.37 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:15:29,132 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 20:15:35,435 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.259e+01 2.465e+01 2.807e+01 3.943e+01, threshold=4.931e+01, percent-clipped=0.0 2024-08-18 20:15:38,113 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 20:15:45,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4079180.0, ans=0.0 2024-08-18 20:15:48,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4079180.0, ans=0.1 2024-08-18 20:16:09,570 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 20:16:12,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4079380.0, ans=0.125 2024-08-18 20:16:27,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4079580.0, ans=0.05 2024-08-18 20:16:28,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6850, loss[loss=0.09676, beats_loss=0.01215, ecapa_loss=0.0001298, whisper_loss=0.08331, over 16596.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001431, whisper_loss=0.09075, over 3916472.98 frames. ], batch size: 65, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:16:28,776 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 20:16:29,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.64 vs. limit=10.0 2024-08-18 20:16:42,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4079680.0, ans=0.125 2024-08-18 20:16:44,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4079680.0, ans=0.125 2024-08-18 20:16:49,280 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 20:17:01,103 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 20:17:02,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4079780.0, ans=0.125 2024-08-18 20:17:07,695 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 12 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-18 20:17:12,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4079880.0, ans=0.125 2024-08-18 20:17:13,566 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 20:17:32,780 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-18 20:17:34,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6900, loss[loss=0.08812, beats_loss=0.0128, ecapa_loss=0.0001556, whisper_loss=0.07377, over 20647.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.000143, whisper_loss=0.09091, over 3904184.21 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:17:38,193 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4080080.0, ans=0.1 2024-08-18 20:17:46,854 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.343e+01 2.661e+01 3.031e+01 5.071e+01, threshold=5.321e+01, percent-clipped=1.0 2024-08-18 20:17:47,037 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 20:17:54,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4080180.0, ans=0.1 2024-08-18 20:17:57,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4080180.0, ans=0.125 2024-08-18 20:17:57,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4080180.0, ans=0.025 2024-08-18 20:17:58,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=12.0 2024-08-18 20:18:06,268 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-18 20:18:16,517 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 20:18:20,067 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 20:18:20,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4080380.0, ans=0.0 2024-08-18 20:18:20,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4080380.0, ans=0.125 2024-08-18 20:18:38,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 6950, loss[loss=0.1097, beats_loss=0.009621, ecapa_loss=0.0001604, whisper_loss=0.0985, over 22385.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.09055, over 3912095.96 frames. ], batch size: 89, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:18:57,055 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 20:19:07,967 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-18 20:19:13,520 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 20:19:13,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4080780.0, ans=0.125 2024-08-18 20:19:23,949 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 20:19:40,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4080980.0, ans=0.0 2024-08-18 20:19:43,051 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7000, loss[loss=0.09973, beats_loss=0.01058, ecapa_loss=0.0001644, whisper_loss=0.08751, over 21390.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001425, whisper_loss=0.0905, over 3906367.95 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:19:48,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4081080.0, ans=0.125 2024-08-18 20:19:54,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.235e+01 2.482e+01 2.791e+01 3.681e+01, threshold=4.964e+01, percent-clipped=0.0 2024-08-18 20:20:24,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4081380.0, ans=0.1 2024-08-18 20:20:27,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4081380.0, ans=0.2 2024-08-18 20:20:38,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4081480.0, ans=0.0 2024-08-18 20:20:47,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7050, loss[loss=0.1096, beats_loss=0.00943, ecapa_loss=0.0001392, whisper_loss=0.09881, over 20319.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01056, ecapa_loss=0.0001438, whisper_loss=0.09007, over 3904725.29 frames. ], batch size: 80, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:20:54,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4081580.0, ans=0.2 2024-08-18 20:21:17,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4081780.0, ans=0.1 2024-08-18 20:21:19,432 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-08-18 20:21:23,885 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-18 20:21:40,836 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:21:47,164 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 20:21:48,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4081980.0, ans=0.125 2024-08-18 20:21:52,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7100, loss[loss=0.09093, beats_loss=0.01087, ecapa_loss=0.000131, whisper_loss=0.07875, over 14502.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.000144, whisper_loss=0.09029, over 3906104.04 frames. ], batch size: 55, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:22:01,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4082080.0, ans=0.0 2024-08-18 20:22:04,118 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.313e+01 2.533e+01 2.792e+01 3.997e+01, threshold=5.065e+01, percent-clipped=0.0 2024-08-18 20:22:04,900 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.88 vs. limit=22.5 2024-08-18 20:22:05,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.out_whiten.whitening_limit, batch_count=4082180.0, ans=8.0 2024-08-18 20:22:37,130 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-18 20:22:57,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7150, loss[loss=0.09086, beats_loss=0.009128, ecapa_loss=0.0001561, whisper_loss=0.08017, over 17408.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001429, whisper_loss=0.09092, over 3930766.44 frames. ], batch size: 71, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:23:04,090 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-18 20:23:12,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-08-18 20:23:33,779 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2024-08-18 20:23:36,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4082880.0, ans=0.0 2024-08-18 20:23:40,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.69 vs. limit=10.0 2024-08-18 20:23:51,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4082980.0, ans=0.125 2024-08-18 20:23:54,151 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 20:23:55,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4082980.0, ans=0.2 2024-08-18 20:23:58,348 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 20:23:58,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4082980.0, ans=0.5 2024-08-18 20:24:02,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=4083080.0, ans=0.5 2024-08-18 20:24:03,595 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7200, loss[loss=0.1091, beats_loss=0.009218, ecapa_loss=0.0001358, whisper_loss=0.09849, over 18030.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01056, ecapa_loss=0.0001421, whisper_loss=0.09069, over 3929523.69 frames. ], batch size: 68, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:24:14,865 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.258e+01 2.559e+01 2.767e+01 6.438e+01, threshold=5.118e+01, percent-clipped=2.0 2024-08-18 20:24:16,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4083180.0, ans=0.1 2024-08-18 20:24:23,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2024-08-18 20:24:41,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4083380.0, ans=0.125 2024-08-18 20:24:42,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4083380.0, ans=0.0 2024-08-18 20:25:07,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4083580.0, ans=0.0 2024-08-18 20:25:08,626 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7250, loss[loss=0.11, beats_loss=0.01116, ecapa_loss=0.0001246, whisper_loss=0.09757, over 19615.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01053, ecapa_loss=0.000142, whisper_loss=0.09082, over 3918049.13 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:25:19,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4083580.0, ans=0.1 2024-08-18 20:25:24,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4083680.0, ans=0.0 2024-08-18 20:25:45,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4083780.0, ans=0.2 2024-08-18 20:25:46,577 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 38 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 20:25:55,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4083880.0, ans=0.1 2024-08-18 20:26:07,405 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-18 20:26:14,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4083980.0, ans=0.0 2024-08-18 20:26:16,588 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7300, loss[loss=0.09837, beats_loss=0.01182, ecapa_loss=0.0001412, whisper_loss=0.08514, over 21659.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01055, ecapa_loss=0.0001427, whisper_loss=0.09119, over 3939686.24 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:26:16,901 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 11 from Vox, 30 fro AS 2024-08-18 20:26:18,294 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-18 20:26:31,273 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 12 from Vox, 40 fro AS 2024-08-18 20:26:32,311 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.881e+01 2.408e+01 2.606e+01 2.923e+01 5.019e+01, threshold=5.211e+01, percent-clipped=0.0 2024-08-18 20:26:36,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4084180.0, ans=0.1 2024-08-18 20:26:48,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4084280.0, ans=0.5 2024-08-18 20:26:55,961 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.209e+01 2024-08-18 20:27:09,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4084380.0, ans=0.125 2024-08-18 20:27:17,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=15.0 2024-08-18 20:27:33,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7350, loss[loss=0.1002, beats_loss=0.01214, ecapa_loss=0.0001344, whisper_loss=0.0867, over 20568.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01048, ecapa_loss=0.0001442, whisper_loss=0.09115, over 3933803.22 frames. ], batch size: 82, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:27:34,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-18 20:27:35,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4084580.0, ans=0.125 2024-08-18 20:28:06,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4084680.0, ans=0.035 2024-08-18 20:28:14,929 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 26 from Vox, 39 fro AS 2024-08-18 20:28:18,462 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 20:28:20,704 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-18 20:28:35,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4084880.0, ans=0.95 2024-08-18 20:28:35,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4084880.0, ans=0.025 2024-08-18 20:28:45,835 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 20:29:06,966 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7400, loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001365, whisper_loss=0.09152, over 21061.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001445, whisper_loss=0.09057, over 3889528.58 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:29:13,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4085080.0, ans=0.1 2024-08-18 20:29:25,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.347e+01 2.572e+01 2.832e+01 4.744e+01, threshold=5.145e+01, percent-clipped=0.0 2024-08-18 20:29:27,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4085180.0, ans=0.125 2024-08-18 20:29:28,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4085180.0, ans=0.05 2024-08-18 20:29:31,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4085180.0, ans=0.125 2024-08-18 20:29:33,453 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 20:29:33,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4085180.0, ans=0.0 2024-08-18 20:29:37,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-18 20:29:57,713 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:30:15,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.88 vs. limit=15.0 2024-08-18 20:30:27,550 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 26 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-18 20:30:30,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4085480.0, ans=0.2 2024-08-18 20:30:38,995 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7450, loss[loss=0.08679, beats_loss=0.009945, ecapa_loss=0.0001429, whisper_loss=0.07542, over 22932.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001444, whisper_loss=0.09065, over 3900995.88 frames. ], batch size: 91, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:30:54,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4085580.0, ans=0.125 2024-08-18 20:31:05,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4085680.0, ans=0.1 2024-08-18 20:31:12,512 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=3.235e-01 2024-08-18 20:31:44,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4085880.0, ans=0.0 2024-08-18 20:32:09,059 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 20:32:14,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2024-08-18 20:32:27,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7500, loss[loss=0.1182, beats_loss=0.01007, ecapa_loss=0.0001291, whisper_loss=0.1068, over 21481.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001437, whisper_loss=0.0905, over 3902962.43 frames. ], batch size: 84, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:32:47,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.294e+01 2.519e+01 2.774e+01 4.079e+01, threshold=5.039e+01, percent-clipped=0.0 2024-08-18 20:32:50,564 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 20:32:52,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4086180.0, ans=0.09899494936611666 2024-08-18 20:32:54,302 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 20:32:58,661 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 20:33:06,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.17 vs. limit=22.5 2024-08-18 20:33:08,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4086180.0, ans=0.125 2024-08-18 20:33:28,402 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 20:33:33,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4086280.0, ans=0.125 2024-08-18 20:33:36,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4086380.0, ans=0.125 2024-08-18 20:33:43,600 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-18 20:33:56,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4086380.0, ans=0.125 2024-08-18 20:34:22,346 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 20:34:25,851 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7550, loss[loss=0.1103, beats_loss=0.01086, ecapa_loss=9.5e-05, whisper_loss=0.09851, over 23183.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01045, ecapa_loss=0.0001436, whisper_loss=0.0907, over 3876363.68 frames. ], batch size: 86, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:34:26,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4086580.0, ans=0.125 2024-08-18 20:34:45,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4086580.0, ans=0.0 2024-08-18 20:35:02,546 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 20:35:21,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=22.5 2024-08-18 20:35:31,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4086880.0, ans=0.1 2024-08-18 20:35:32,287 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 20:35:43,039 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 20:35:45,202 WARNING [optim.py:496] (3/4) Scaling gradients by 0.029811669141054153, model_norm_threshold=50.385860443115234 2024-08-18 20:35:45,372 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.709e+05, grad_sumsq=5.709e+05, orig_rms_sq=1.000e+00 2024-08-18 20:35:51,140 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7600, loss[loss=0.1247, beats_loss=0.007031, ecapa_loss=0.0001696, whisper_loss=0.1159, over 22555.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001435, whisper_loss=0.09012, over 3853176.34 frames. ], batch size: 88, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:36:04,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.760e+01 2.298e+01 2.604e+01 3.012e+01 1.690e+03, threshold=5.209e+01, percent-clipped=1.0 2024-08-18 20:36:07,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4087180.0, ans=0.125 2024-08-18 20:36:19,640 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-18 20:36:22,788 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 20:36:28,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4087280.0, ans=0.0 2024-08-18 20:36:31,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4087280.0, ans=0.125 2024-08-18 20:36:33,729 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 20:36:49,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4087380.0, ans=0.125 2024-08-18 20:36:52,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4087480.0, ans=0.125 2024-08-18 20:36:52,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4087480.0, ans=0.0 2024-08-18 20:37:05,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7650, loss[loss=0.1, beats_loss=0.01052, ecapa_loss=0.0001451, whisper_loss=0.08805, over 18669.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001432, whisper_loss=0.09038, over 3850886.10 frames. ], batch size: 77, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:37:17,089 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 20:37:25,964 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 20:37:36,437 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 20:37:58,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.15 vs. limit=15.0 2024-08-18 20:38:03,117 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4087880.0, ans=0.0 2024-08-18 20:38:03,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4087880.0, ans=0.1 2024-08-18 20:38:13,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4087980.0, ans=0.125 2024-08-18 20:38:21,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7700, loss[loss=0.1076, beats_loss=0.01092, ecapa_loss=0.0001367, whisper_loss=0.09527, over 22530.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01035, ecapa_loss=0.0001435, whisper_loss=0.09063, over 3863816.96 frames. ], batch size: 92, lr: 2.19e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:38:28,415 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 20:38:32,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4088080.0, ans=0.125 2024-08-18 20:38:34,623 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.256e+01 2.493e+01 2.776e+01 3.819e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-18 20:38:44,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4088180.0, ans=0.0 2024-08-18 20:38:51,547 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 33 from Vox, 31 fro AS 2024-08-18 20:39:05,432 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 20:39:12,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4088380.0, ans=0.0 2024-08-18 20:39:13,736 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 20:39:15,362 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 20:39:15,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4088380.0, ans=0.125 2024-08-18 20:39:22,319 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 20:39:35,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7750, loss[loss=0.09356, beats_loss=0.01243, ecapa_loss=0.0001215, whisper_loss=0.07991, over 22367.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001423, whisper_loss=0.09003, over 3863267.76 frames. ], batch size: 90, lr: 2.19e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:39:42,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4088580.0, ans=0.1 2024-08-18 20:39:43,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4088580.0, ans=0.125 2024-08-18 20:39:45,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4088580.0, ans=0.125 2024-08-18 20:39:52,130 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 20:39:59,023 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-18 20:40:12,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=4088780.0, ans=15.0 2024-08-18 20:40:24,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4088880.0, ans=0.0 2024-08-18 20:40:24,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4088880.0, ans=0.125 2024-08-18 20:40:32,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4088880.0, ans=0.125 2024-08-18 20:40:34,367 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 20:40:50,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7800, loss[loss=0.1076, beats_loss=0.008494, ecapa_loss=0.0001584, whisper_loss=0.09753, over 13953.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.09082, over 3889391.14 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:40:52,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4089080.0, ans=0.0 2024-08-18 20:41:00,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4089080.0, ans=0.1 2024-08-18 20:41:02,866 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.709e+01 2.370e+01 2.620e+01 3.018e+01 4.706e+01, threshold=5.240e+01, percent-clipped=0.0 2024-08-18 20:41:27,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4089280.0, ans=0.0 2024-08-18 20:41:40,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.49 vs. limit=22.5 2024-08-18 20:42:01,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4089480.0, ans=0.125 2024-08-18 20:42:04,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7850, loss[loss=0.1066, beats_loss=0.0102, ecapa_loss=0.0001428, whisper_loss=0.09495, over 23592.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01047, ecapa_loss=0.0001416, whisper_loss=0.09088, over 3904211.63 frames. ], batch size: 94, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 20:42:08,468 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.92 vs. limit=10.0 2024-08-18 20:42:09,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.33 vs. limit=15.0 2024-08-18 20:42:11,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4089580.0, ans=0.2 2024-08-18 20:42:16,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4089580.0, ans=0.0 2024-08-18 20:42:19,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4089680.0, ans=0.125 2024-08-18 20:42:22,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4089680.0, ans=0.0 2024-08-18 20:42:35,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4089780.0, ans=0.0 2024-08-18 20:42:49,709 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-18 20:43:13,439 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-18 20:43:17,758 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7900, loss[loss=0.12, beats_loss=0.01028, ecapa_loss=0.0001286, whisper_loss=0.1084, over 23562.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001416, whisper_loss=0.09041, over 3887115.83 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:43:27,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.96 vs. limit=15.0 2024-08-18 20:43:32,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.368e+01 2.642e+01 2.985e+01 1.655e+02, threshold=5.283e+01, percent-clipped=2.0 2024-08-18 20:43:32,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4090180.0, ans=0.125 2024-08-18 20:43:37,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4090180.0, ans=0.0 2024-08-18 20:43:46,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4090280.0, ans=0.125 2024-08-18 20:43:48,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4090280.0, ans=0.125 2024-08-18 20:43:51,319 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 20:43:57,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4090280.0, ans=0.2 2024-08-18 20:44:08,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-08-18 20:44:09,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4090380.0, ans=0.2 2024-08-18 20:44:15,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4090480.0, ans=0.0 2024-08-18 20:44:15,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4090480.0, ans=0.125 2024-08-18 20:44:17,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4090480.0, ans=0.125 2024-08-18 20:44:29,986 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 7950, loss[loss=0.08565, beats_loss=0.01398, ecapa_loss=0.0001377, whisper_loss=0.0703, over 22011.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001422, whisper_loss=0.09015, over 3898025.58 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:45:07,740 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 20:45:09,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4090780.0, ans=0.2 2024-08-18 20:45:18,030 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-18 20:45:23,329 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 20:45:40,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8000, loss[loss=0.103, beats_loss=0.01109, ecapa_loss=0.0001025, whisper_loss=0.0909, over 23461.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01058, ecapa_loss=0.0001412, whisper_loss=0.09038, over 3921360.12 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:45:51,294 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 20:45:56,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.375e+01 2.587e+01 2.855e+01 4.354e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-18 20:45:56,423 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 20:46:08,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4091280.0, ans=0.09899494936611666 2024-08-18 20:46:24,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4091380.0, ans=0.125 2024-08-18 20:46:38,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4091480.0, ans=0.0 2024-08-18 20:46:48,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4091480.0, ans=0.1 2024-08-18 20:46:52,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8050, loss[loss=0.1128, beats_loss=0.01066, ecapa_loss=0.0001343, whisper_loss=0.1008, over 15470.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001413, whisper_loss=0.09067, over 3881935.07 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:46:52,809 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 28 from Vox, 26 fro AS 2024-08-18 20:46:53,982 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 14 from Vox, 19 fro AS 2024-08-18 20:46:54,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4091580.0, ans=0.125 2024-08-18 20:46:54,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4091580.0, ans=0.1 2024-08-18 20:46:58,127 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 20:47:33,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.14 vs. limit=6.0 2024-08-18 20:47:37,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4091880.0, ans=0.125 2024-08-18 20:47:37,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4091880.0, ans=0.125 2024-08-18 20:47:39,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4091880.0, ans=0.0 2024-08-18 20:47:43,069 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-18 20:47:46,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4091980.0, ans=0.125 2024-08-18 20:47:53,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4091980.0, ans=0.125 2024-08-18 20:47:54,695 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 32 from Vox, 35 fro AS 2024-08-18 20:48:00,117 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8100, loss[loss=0.1228, beats_loss=0.0108, ecapa_loss=0.0001203, whisper_loss=0.1108, over 24790.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001419, whisper_loss=0.09011, over 3906545.35 frames. ], batch size: 94, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:48:04,747 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 20:48:04,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4092080.0, ans=0.125 2024-08-18 20:48:09,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4092080.0, ans=0.0 2024-08-18 20:48:11,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-18 20:48:14,181 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.377e+01 2.582e+01 2.787e+01 4.982e+01, threshold=5.165e+01, percent-clipped=0.0 2024-08-18 20:48:24,084 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 37 from Vox, 32 fro AS 2024-08-18 20:48:24,538 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.27 vs. limit=22.5 2024-08-18 20:48:30,805 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 20:48:31,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4092280.0, ans=0.125 2024-08-18 20:48:33,361 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 20:48:38,337 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2024-08-18 20:49:05,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=8.0 2024-08-18 20:49:10,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8150, loss[loss=0.1036, beats_loss=0.01063, ecapa_loss=0.0001555, whisper_loss=0.09141, over 20802.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001429, whisper_loss=0.08965, over 3920997.79 frames. ], batch size: 84, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:49:32,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4092680.0, ans=0.0 2024-08-18 20:49:35,066 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-18 20:49:39,604 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-18 20:49:58,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4092880.0, ans=0.125 2024-08-18 20:50:22,417 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8200, loss[loss=0.1069, beats_loss=0.01049, ecapa_loss=0.0001232, whisper_loss=0.09514, over 22718.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001444, whisper_loss=0.09029, over 3933514.79 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:50:23,965 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 20:50:28,030 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-18 20:50:29,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4093080.0, ans=0.95 2024-08-18 20:50:29,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4093080.0, ans=0.0 2024-08-18 20:50:35,261 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.361e+01 2.593e+01 2.842e+01 4.964e+01, threshold=5.187e+01, percent-clipped=0.0 2024-08-18 20:50:35,587 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 20:50:52,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4093280.0, ans=0.2 2024-08-18 20:50:56,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4093280.0, ans=0.1 2024-08-18 20:51:05,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2024-08-18 20:51:09,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4093380.0, ans=0.125 2024-08-18 20:51:12,311 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 20:51:22,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4093480.0, ans=0.0 2024-08-18 20:51:26,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4093480.0, ans=0.125 2024-08-18 20:51:29,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8250, loss[loss=0.08869, beats_loss=0.01219, ecapa_loss=0.0001579, whisper_loss=0.07493, over 22006.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001439, whisper_loss=0.09048, over 3936312.40 frames. ], batch size: 94, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:51:34,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4093580.0, ans=0.125 2024-08-18 20:51:40,834 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 20:51:53,414 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 20:51:56,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4093680.0, ans=0.125 2024-08-18 20:52:14,607 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.14 vs. limit=6.0 2024-08-18 20:52:17,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4093880.0, ans=0.125 2024-08-18 20:52:39,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8300, loss[loss=0.1137, beats_loss=0.009326, ecapa_loss=0.0001614, whisper_loss=0.1027, over 22005.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01039, ecapa_loss=0.0001439, whisper_loss=0.09045, over 3919664.55 frames. ], batch size: 87, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:52:47,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4094080.0, ans=0.125 2024-08-18 20:52:53,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.411e+01 2.666e+01 2.982e+01 3.666e+02, threshold=5.332e+01, percent-clipped=2.0 2024-08-18 20:53:02,808 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 19 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 20:53:07,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4094280.0, ans=0.0 2024-08-18 20:53:16,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2024-08-18 20:53:38,127 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 20:53:44,808 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-18 20:53:46,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4094480.0, ans=0.125 2024-08-18 20:53:48,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8350, loss[loss=0.09131, beats_loss=0.01196, ecapa_loss=0.0001526, whisper_loss=0.07782, over 20730.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001443, whisper_loss=0.08988, over 3938202.62 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:54:03,277 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 20:54:06,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4094680.0, ans=0.2 2024-08-18 20:54:15,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4094780.0, ans=0.0 2024-08-18 20:54:20,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4094780.0, ans=0.025 2024-08-18 20:54:32,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4094880.0, ans=0.2 2024-08-18 20:54:38,867 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 26 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-18 20:54:55,475 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8400, loss[loss=0.09942, beats_loss=0.009961, ecapa_loss=0.0001461, whisper_loss=0.088, over 22516.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001438, whisper_loss=0.08995, over 3921045.24 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:54:57,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4095080.0, ans=0.0 2024-08-18 20:55:01,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4095080.0, ans=0.1 2024-08-18 20:55:09,028 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.021e+01 2.413e+01 2.574e+01 2.874e+01 4.308e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-18 20:55:26,609 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2024-08-18 20:55:46,838 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4095380.0, ans=0.2 2024-08-18 20:55:49,532 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 20:56:03,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4095480.0, ans=0.0 2024-08-18 20:56:05,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8450, loss[loss=0.1067, beats_loss=0.009108, ecapa_loss=0.0001088, whisper_loss=0.09653, over 17584.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.000144, whisper_loss=0.09044, over 3869403.97 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:56:08,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4095580.0, ans=0.125 2024-08-18 20:56:18,480 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 20:56:30,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4095680.0, ans=0.0 2024-08-18 20:56:31,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4095680.0, ans=10.0 2024-08-18 20:56:34,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4095780.0, ans=0.1 2024-08-18 20:56:38,292 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 16 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-18 20:56:44,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=15.0 2024-08-18 20:56:59,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4095880.0, ans=0.125 2024-08-18 20:56:59,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4095880.0, ans=0.07 2024-08-18 20:57:05,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4095980.0, ans=0.125 2024-08-18 20:57:16,109 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4096080.0, ans=0.2 2024-08-18 20:57:16,810 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8500, loss[loss=0.09374, beats_loss=0.01193, ecapa_loss=0.0001077, whisper_loss=0.08073, over 19780.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001449, whisper_loss=0.09136, over 3826656.10 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:57:35,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.290e+01 2.484e+01 2.745e+01 4.794e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-18 20:57:36,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4096180.0, ans=0.125 2024-08-18 20:57:38,979 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-18 20:57:39,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4096180.0, ans=0.125 2024-08-18 20:57:51,907 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 27 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 20:58:14,354 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=12.0 2024-08-18 20:58:30,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4096480.0, ans=0.125 2024-08-18 20:58:33,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8550, loss[loss=0.1076, beats_loss=0.0101, ecapa_loss=0.0001517, whisper_loss=0.09602, over 20272.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01033, ecapa_loss=0.0001437, whisper_loss=0.09173, over 3870615.89 frames. ], batch size: 82, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:58:39,337 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.228e+05 2024-08-18 20:58:49,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4096680.0, ans=0.2 2024-08-18 20:59:09,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4096780.0, ans=0.1 2024-08-18 20:59:20,016 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-18 20:59:32,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4096980.0, ans=0.0 2024-08-18 20:59:37,365 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=12.0 2024-08-18 20:59:37,687 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05524347350001335, model_norm_threshold=49.67615509033203 2024-08-18 20:59:37,851 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.096e+05, grad_sumsq=1.096e+05, orig_rms_sq=1.000e+00 2024-08-18 20:59:46,742 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 21 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 20:59:47,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8600, loss[loss=0.09909, beats_loss=0.01003, ecapa_loss=0.0001326, whisper_loss=0.08773, over 17185.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01032, ecapa_loss=0.0001434, whisper_loss=0.09185, over 3856623.94 frames. ], batch size: 66, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 20:59:52,024 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 21:00:01,362 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-18 21:00:02,266 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.313e+01 2.617e+01 3.011e+01 8.992e+02, threshold=5.234e+01, percent-clipped=3.0 2024-08-18 21:00:10,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4097180.0, ans=0.09899494936611666 2024-08-18 21:00:12,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4097180.0, ans=0.1 2024-08-18 21:00:19,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4097280.0, ans=0.125 2024-08-18 21:00:38,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4097380.0, ans=0.0 2024-08-18 21:00:47,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4097480.0, ans=0.125 2024-08-18 21:00:57,474 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8650, loss[loss=0.08088, beats_loss=0.01173, ecapa_loss=0.0001338, whisper_loss=0.06782, over 21540.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01038, ecapa_loss=0.0001424, whisper_loss=0.0912, over 3865497.43 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:01:18,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.93 vs. limit=15.0 2024-08-18 21:01:24,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4097680.0, ans=0.125 2024-08-18 21:01:26,692 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 21:01:30,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4097780.0, ans=0.0 2024-08-18 21:01:31,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4097780.0, ans=0.125 2024-08-18 21:01:34,810 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2024-08-18 21:01:37,290 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 11 from LS+wenet, 11 from Vox, 39 fro AS 2024-08-18 21:01:56,307 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 13 from Vox, 22 fro AS 2024-08-18 21:01:56,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2024-08-18 21:02:04,929 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-18 21:02:12,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8700, loss[loss=0.1057, beats_loss=0.01201, ecapa_loss=0.00013, whisper_loss=0.09239, over 20431.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001432, whisper_loss=0.09012, over 3825893.27 frames. ], batch size: 81, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:02:13,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4098080.0, ans=0.0 2024-08-18 21:02:14,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4098080.0, ans=0.2 2024-08-18 21:02:25,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=15.0 2024-08-18 21:02:27,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.217e+01 2.441e+01 2.789e+01 4.170e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 21:02:46,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4098280.0, ans=0.0 2024-08-18 21:02:47,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4098280.0, ans=0.125 2024-08-18 21:02:51,598 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 21:03:05,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4098380.0, ans=0.125 2024-08-18 21:03:24,118 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8750, loss[loss=0.09375, beats_loss=0.01127, ecapa_loss=0.0001271, whisper_loss=0.08121, over 19167.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001436, whisper_loss=0.0903, over 3815054.20 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:03:24,433 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 31 from Vox, 32 fro AS 2024-08-18 21:03:37,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4098580.0, ans=0.2 2024-08-18 21:03:40,881 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.734e+01 2024-08-18 21:03:42,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4098680.0, ans=0.1 2024-08-18 21:03:43,460 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-18 21:03:56,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4098780.0, ans=0.1 2024-08-18 21:04:11,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=4098880.0, ans=15.0 2024-08-18 21:04:18,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4098880.0, ans=0.0 2024-08-18 21:04:18,189 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.051e+00 2024-08-18 21:04:21,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4098880.0, ans=0.125 2024-08-18 21:04:25,689 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-18 21:04:41,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8800, loss[loss=0.1028, beats_loss=0.01115, ecapa_loss=0.0001461, whisper_loss=0.09019, over 14722.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01042, ecapa_loss=0.0001438, whisper_loss=0.09074, over 3837866.42 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:04:52,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4099080.0, ans=0.2 2024-08-18 21:04:56,561 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.312e+01 2.589e+01 2.893e+01 4.195e+01, threshold=5.178e+01, percent-clipped=0.0 2024-08-18 21:05:10,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.58 vs. limit=22.5 2024-08-18 21:05:10,978 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4099280.0, ans=0.1 2024-08-18 21:05:18,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-18 21:05:38,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4099380.0, ans=0.0 2024-08-18 21:05:58,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8850, loss[loss=0.09618, beats_loss=0.01152, ecapa_loss=0.0001461, whisper_loss=0.0832, over 21871.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001427, whisper_loss=0.09044, over 3874911.20 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:06:18,954 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-08-18 21:06:40,156 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 21:06:43,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4099880.0, ans=0.0 2024-08-18 21:06:48,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4099880.0, ans=0.1 2024-08-18 21:07:08,659 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-18 21:07:16,737 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8900, loss[loss=0.1072, beats_loss=0.01092, ecapa_loss=0.0001548, whisper_loss=0.0947, over 20045.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01059, ecapa_loss=0.0001418, whisper_loss=0.09021, over 3855547.51 frames. ], batch size: 82, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:07:21,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4100080.0, ans=0.125 2024-08-18 21:07:33,462 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.315e+01 2.485e+01 2.808e+01 3.547e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-18 21:07:51,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4100280.0, ans=0.1 2024-08-18 21:08:22,486 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 21:08:28,862 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 21:08:38,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 8950, loss[loss=0.07973, beats_loss=0.0115, ecapa_loss=0.0001021, whisper_loss=0.06721, over 15857.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.000143, whisper_loss=0.09031, over 3853494.37 frames. ], batch size: 59, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:08:42,746 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.62 vs. limit=22.5 2024-08-18 21:08:43,703 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 21:08:45,173 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-18 21:08:53,232 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 21:08:55,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4100680.0, ans=0.125 2024-08-18 21:08:59,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4100680.0, ans=0.0 2024-08-18 21:09:01,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4100680.0, ans=0.125 2024-08-18 21:09:05,226 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 21:09:13,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4100780.0, ans=0.125 2024-08-18 21:09:13,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4100780.0, ans=0.125 2024-08-18 21:09:18,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4100780.0, ans=0.125 2024-08-18 21:09:22,312 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-08-18 21:09:23,071 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 21:09:23,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4100880.0, ans=0.2 2024-08-18 21:09:27,090 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 21:09:51,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9000, loss[loss=0.1239, beats_loss=0.008699, ecapa_loss=0.0001525, whisper_loss=0.1137, over 22625.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01055, ecapa_loss=0.0001421, whisper_loss=0.09093, over 3880406.50 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:09:51,918 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 21:10:26,987 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on ASR_libri: loss=0.2539, beats_loss=0, ecapa_loss=0.0005164, whisper_loss=0.2487, over 922467.00 frames. 2024-08-18 21:10:44,090 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on SV_voxceleb1: loss=0.004068, beats_loss=0, ecapa_loss=0.0004068, whisper_loss=0, over 939242.00 frames. 2024-08-18 21:12:26,400 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on AT_audioset: loss=0.0231, beats_loss=0.0231, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 21:12:26,410 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 21:12:40,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.331e+01 2.657e+01 3.077e+01 4.248e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-18 21:12:59,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4101280.0, ans=0.05 2024-08-18 21:13:05,366 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-18 21:13:07,835 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 39 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-18 21:13:25,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4101480.0, ans=0.1 2024-08-18 21:13:32,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4101480.0, ans=0.0 2024-08-18 21:13:38,883 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9050, loss[loss=0.09123, beats_loss=0.01501, ecapa_loss=0.0001171, whisper_loss=0.07506, over 21762.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01056, ecapa_loss=0.0001422, whisper_loss=0.09082, over 3896132.16 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:13:46,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4101580.0, ans=0.125 2024-08-18 21:13:54,701 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 21:14:14,625 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-08-18 21:14:23,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4101880.0, ans=0.0 2024-08-18 21:14:29,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4101880.0, ans=0.0 2024-08-18 21:14:35,121 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-18 21:14:35,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4101880.0, ans=0.1 2024-08-18 21:14:48,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4101980.0, ans=0.2 2024-08-18 21:14:48,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.13 vs. limit=15.0 2024-08-18 21:14:52,428 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9100, loss[loss=0.1227, beats_loss=0.009351, ecapa_loss=0.0001377, whisper_loss=0.112, over 21868.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01052, ecapa_loss=0.0001422, whisper_loss=0.09087, over 3879207.96 frames. ], batch size: 81, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:15:06,452 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.769e+01 2.448e+01 2.687e+01 2.999e+01 3.130e+02, threshold=5.374e+01, percent-clipped=2.0 2024-08-18 21:15:18,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4102180.0, ans=0.125 2024-08-18 21:15:31,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4102280.0, ans=0.1 2024-08-18 21:15:35,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.37 vs. limit=15.0 2024-08-18 21:16:04,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4102580.0, ans=0.0 2024-08-18 21:16:05,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9150, loss[loss=0.1151, beats_loss=0.01159, ecapa_loss=0.000114, whisper_loss=0.1023, over 23243.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01053, ecapa_loss=0.0001429, whisper_loss=0.09074, over 3884724.16 frames. ], batch size: 86, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:16:07,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4102580.0, ans=0.2 2024-08-18 21:16:23,101 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-18 21:16:27,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4102680.0, ans=0.0 2024-08-18 21:16:27,771 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2024-08-18 21:16:37,134 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-18 21:16:49,644 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 14 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-18 21:17:06,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4102980.0, ans=0.1 2024-08-18 21:17:09,840 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-18 21:17:15,279 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9200, loss[loss=0.1222, beats_loss=0.01074, ecapa_loss=0.0001377, whisper_loss=0.1101, over 22328.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.09103, over 3911095.54 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:17:18,548 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 17 from Vox, 49 fro AS 2024-08-18 21:17:29,685 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.018e+01 2.309e+01 2.617e+01 2.863e+01 2.175e+02, threshold=5.234e+01, percent-clipped=2.0 2024-08-18 21:17:37,827 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 21:17:39,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4103180.0, ans=0.125 2024-08-18 21:17:48,249 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4103280.0, ans=0.1 2024-08-18 21:17:49,161 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 21:17:51,200 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 21:17:57,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4103380.0, ans=0.1 2024-08-18 21:18:22,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4103480.0, ans=0.125 2024-08-18 21:18:23,727 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 21:18:26,827 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9250, loss[loss=0.09638, beats_loss=0.01106, ecapa_loss=0.0001243, whisper_loss=0.08408, over 19430.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01054, ecapa_loss=0.0001429, whisper_loss=0.09063, over 3893164.65 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:19:13,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4103880.0, ans=0.2 2024-08-18 21:19:13,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4103880.0, ans=0.07 2024-08-18 21:19:18,692 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-18 21:19:32,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4103980.0, ans=0.125 2024-08-18 21:19:40,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9300, loss[loss=0.121, beats_loss=0.00805, ecapa_loss=0.000124, whisper_loss=0.1117, over 17909.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001426, whisper_loss=0.09038, over 3908855.49 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:19:44,958 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 21:19:52,611 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-18 21:19:53,833 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.381e+01 2.641e+01 3.038e+01 1.790e+02, threshold=5.283e+01, percent-clipped=1.0 2024-08-18 21:19:54,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-08-18 21:20:04,162 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 21:20:06,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4104180.0, ans=0.125 2024-08-18 21:20:08,488 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 21:20:09,933 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 12 from Vox, 47 fro AS 2024-08-18 21:20:32,085 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 21:20:47,545 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 21:20:49,106 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-18 21:20:51,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9350, loss[loss=0.09463, beats_loss=0.01047, ecapa_loss=0.000153, whisper_loss=0.08263, over 15214.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01062, ecapa_loss=0.0001435, whisper_loss=0.08972, over 3885639.53 frames. ], batch size: 61, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:20:52,026 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-18 21:20:55,504 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06869103014469147, model_norm_threshold=52.8264274597168 2024-08-18 21:20:55,671 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.21, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.260e+05, grad_sumsq=1.216e+07, orig_rms_sq=1.036e-02 2024-08-18 21:21:02,825 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-18 21:21:12,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4104680.0, ans=0.0 2024-08-18 21:21:24,533 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-18 21:21:44,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-08-18 21:21:47,813 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-18 21:21:49,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4104980.0, ans=0.0 2024-08-18 21:22:01,319 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9400, loss[loss=0.1135, beats_loss=0.009425, ecapa_loss=0.0001759, whisper_loss=0.1023, over 15288.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001433, whisper_loss=0.08937, over 3863805.25 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:22:01,945 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-18 21:22:17,196 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.616e+01 2.374e+01 2.623e+01 3.026e+01 7.690e+02, threshold=5.245e+01, percent-clipped=3.0 2024-08-18 21:22:17,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4105180.0, ans=0.0 2024-08-18 21:22:23,435 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-18 21:22:38,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4105280.0, ans=0.125 2024-08-18 21:22:43,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4105380.0, ans=0.125 2024-08-18 21:22:43,359 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.110e+01 2024-08-18 21:22:51,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4105380.0, ans=0.1 2024-08-18 21:22:57,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4105480.0, ans=0.05 2024-08-18 21:23:08,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4105480.0, ans=0.0 2024-08-18 21:23:12,776 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9450, loss[loss=0.1051, beats_loss=0.009637, ecapa_loss=0.0001011, whisper_loss=0.09442, over 15665.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001429, whisper_loss=0.09001, over 3867651.54 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:23:21,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4105580.0, ans=0.125 2024-08-18 21:23:23,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.85 vs. limit=15.0 2024-08-18 21:23:37,227 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 21:24:12,695 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2024-08-18 21:24:18,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4105980.0, ans=0.125 2024-08-18 21:24:25,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9500, loss[loss=0.09506, beats_loss=0.01218, ecapa_loss=0.000124, whisper_loss=0.08164, over 17872.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01055, ecapa_loss=0.0001434, whisper_loss=0.08907, over 3859863.62 frames. ], batch size: 72, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:24:39,375 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 21:24:39,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4106080.0, ans=0.0 2024-08-18 21:24:42,304 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.355e+01 2.580e+01 2.944e+01 6.232e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-18 21:24:44,349 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 21:25:10,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4106380.0, ans=0.05 2024-08-18 21:25:18,355 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-18 21:25:31,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4106480.0, ans=0.0 2024-08-18 21:25:40,308 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9550, loss[loss=0.0978, beats_loss=0.0115, ecapa_loss=0.0001622, whisper_loss=0.08467, over 22084.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001447, whisper_loss=0.08947, over 3850497.78 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:26:07,451 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 29 from Vox, 29 fro AS 2024-08-18 21:26:25,300 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-18 21:26:27,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4106880.0, ans=0.125 2024-08-18 21:26:30,091 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 21:26:33,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4106880.0, ans=0.125 2024-08-18 21:26:35,078 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4106880.0, ans=0.07 2024-08-18 21:26:43,457 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-18 21:26:52,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4107080.0, ans=0.0 2024-08-18 21:26:53,583 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9600, loss[loss=0.1048, beats_loss=0.008799, ecapa_loss=0.0001358, whisper_loss=0.09463, over 17669.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001444, whisper_loss=0.08918, over 3851459.41 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:26:56,400 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 21:26:58,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4107080.0, ans=0.2 2024-08-18 21:27:05,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4107080.0, ans=0.09899494936611666 2024-08-18 21:27:07,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.422e+01 2.704e+01 3.075e+01 1.101e+02, threshold=5.409e+01, percent-clipped=2.0 2024-08-18 21:27:13,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4107180.0, ans=0.0 2024-08-18 21:27:43,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4107380.0, ans=0.2 2024-08-18 21:27:46,902 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-18 21:27:55,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4107480.0, ans=0.125 2024-08-18 21:28:05,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4107580.0, ans=0.2 2024-08-18 21:28:06,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9650, loss[loss=0.07539, beats_loss=0.0111, ecapa_loss=0.0001785, whisper_loss=0.0625, over 13214.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001464, whisper_loss=0.08964, over 3838827.93 frames. ], batch size: 58, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:28:10,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4107580.0, ans=0.2 2024-08-18 21:28:28,586 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 21:28:33,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4107780.0, ans=0.2 2024-08-18 21:28:33,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4107780.0, ans=0.1 2024-08-18 21:28:34,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4107780.0, ans=0.1 2024-08-18 21:29:03,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4107980.0, ans=0.0 2024-08-18 21:29:08,419 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.27 vs. limit=5.0 2024-08-18 21:29:13,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9700, loss[loss=0.09826, beats_loss=0.01047, ecapa_loss=0.000125, whisper_loss=0.08655, over 16164.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001455, whisper_loss=0.09029, over 3857419.09 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:29:26,384 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-18 21:29:27,634 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.335e+01 2.610e+01 2.872e+01 4.548e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-18 21:29:33,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4108180.0, ans=0.2 2024-08-18 21:30:03,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4108380.0, ans=0.1 2024-08-18 21:30:06,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4108380.0, ans=0.0 2024-08-18 21:30:22,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9750, loss[loss=0.1257, beats_loss=0.008455, ecapa_loss=0.0001782, whisper_loss=0.1154, over 22381.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001436, whisper_loss=0.0905, over 3845496.82 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:31:07,728 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-18 21:31:12,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4108880.0, ans=0.125 2024-08-18 21:31:13,662 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 21:31:18,021 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 21:31:21,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4108980.0, ans=0.0 2024-08-18 21:31:21,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4108980.0, ans=0.125 2024-08-18 21:31:30,475 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9800, loss[loss=0.1108, beats_loss=0.00914, ecapa_loss=0.0001524, whisper_loss=0.1001, over 22538.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01046, ecapa_loss=0.0001433, whisper_loss=0.08962, over 3855462.61 frames. ], batch size: 90, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:31:30,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4109080.0, ans=0.125 2024-08-18 21:31:37,616 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4109080.0, ans=0.05 2024-08-18 21:31:39,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4109080.0, ans=0.125 2024-08-18 21:31:43,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.918e+01 2.279e+01 2.411e+01 2.686e+01 7.087e+01, threshold=4.821e+01, percent-clipped=1.0 2024-08-18 21:32:00,097 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-18 21:32:06,924 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-18 21:32:13,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4109380.0, ans=0.125 2024-08-18 21:32:15,166 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-18 21:32:30,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4109480.0, ans=0.125 2024-08-18 21:32:34,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.05 vs. limit=6.0 2024-08-18 21:32:36,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9850, loss[loss=0.09063, beats_loss=0.01138, ecapa_loss=0.0001512, whisper_loss=0.07774, over 20922.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001439, whisper_loss=0.08992, over 3863674.27 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:32:48,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=22.5 2024-08-18 21:32:59,756 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.081e-03 2024-08-18 21:33:16,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4109780.0, ans=0.125 2024-08-18 21:33:25,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4109880.0, ans=0.0 2024-08-18 21:33:48,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9900, loss[loss=0.0874, beats_loss=0.0131, ecapa_loss=0.0001322, whisper_loss=0.07298, over 17445.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001435, whisper_loss=0.09045, over 3864330.40 frames. ], batch size: 70, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 21:33:48,501 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-18 21:34:02,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.316e+01 2.554e+01 2.819e+01 9.199e+01, threshold=5.108e+01, percent-clipped=2.0 2024-08-18 21:34:06,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4110180.0, ans=0.0 2024-08-18 21:34:25,262 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-18 21:34:31,787 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 21:34:39,102 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-18 21:34:59,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 9950, loss[loss=0.09368, beats_loss=0.01108, ecapa_loss=0.0001547, whisper_loss=0.08105, over 16499.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001442, whisper_loss=0.09069, over 3858519.12 frames. ], batch size: 67, lr: 2.18e-03, grad_scale: 1.152921504606847e+18 2024-08-18 21:35:04,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4110580.0, ans=0.0 2024-08-18 21:35:20,469 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4110680.0, ans=0.0 2024-08-18 21:35:21,417 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 21:35:25,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4110680.0, ans=0.125 2024-08-18 21:35:37,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4110780.0, ans=0.125 2024-08-18 21:35:38,368 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-18 21:35:41,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4110880.0, ans=0.0 2024-08-18 21:35:42,462 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 21:35:56,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.75 vs. limit=15.0 2024-08-18 21:36:06,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4111080.0, ans=0.2 2024-08-18 21:36:07,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10000, loss[loss=0.09978, beats_loss=0.008018, ecapa_loss=0.0001957, whisper_loss=0.0898, over 13596.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001456, whisper_loss=0.08995, over 3845676.77 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:36:10,793 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-18 21:36:12,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4111080.0, ans=0.0 2024-08-18 21:36:17,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4111080.0, ans=0.125 2024-08-18 21:36:23,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.302e+01 2.583e+01 2.911e+01 1.277e+02, threshold=5.165e+01, percent-clipped=1.0 2024-08-18 21:36:28,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4111180.0, ans=0.125 2024-08-18 21:36:59,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4111380.0, ans=0.125 2024-08-18 21:37:02,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4111380.0, ans=0.0 2024-08-18 21:37:11,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4111480.0, ans=0.0 2024-08-18 21:37:12,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4111480.0, ans=0.1 2024-08-18 21:37:21,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10050, loss[loss=0.1046, beats_loss=0.009039, ecapa_loss=0.0001853, whisper_loss=0.09375, over 21680.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01047, ecapa_loss=0.0001453, whisper_loss=0.0903, over 3852868.63 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:37:21,781 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 21:37:45,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4111680.0, ans=0.2 2024-08-18 21:37:50,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4111780.0, ans=0.1 2024-08-18 21:37:50,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4111780.0, ans=0.1 2024-08-18 21:38:20,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4111980.0, ans=0.0 2024-08-18 21:38:24,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4111980.0, ans=0.125 2024-08-18 21:38:30,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10100, loss[loss=0.08541, beats_loss=0.0129, ecapa_loss=0.0001334, whisper_loss=0.07118, over 22739.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001454, whisper_loss=0.09084, over 3887197.96 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:38:32,659 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-18 21:38:32,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4112080.0, ans=0.2 2024-08-18 21:38:46,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.328e+01 2.605e+01 3.008e+01 2.431e+02, threshold=5.209e+01, percent-clipped=1.0 2024-08-18 21:38:55,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-08-18 21:39:07,812 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.94 vs. limit=6.0 2024-08-18 21:39:09,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.15 vs. limit=22.5 2024-08-18 21:39:11,291 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 22 from LS+wenet, 29 from Vox, 43 fro AS 2024-08-18 21:39:26,638 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 21:39:29,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4112480.0, ans=0.125 2024-08-18 21:39:30,683 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-18 21:39:31,253 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.04 vs. limit=22.5 2024-08-18 21:39:36,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4112580.0, ans=0.05 2024-08-18 21:39:37,000 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10150, loss[loss=0.1201, beats_loss=0.008726, ecapa_loss=0.0001725, whisper_loss=0.1096, over 21904.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001466, whisper_loss=0.09094, over 3938180.50 frames. ], batch size: 89, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:39:59,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4112680.0, ans=0.125 2024-08-18 21:40:03,561 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-18 21:40:05,843 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 21:40:11,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.87 vs. limit=6.0 2024-08-18 21:40:13,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4112780.0, ans=0.0 2024-08-18 21:40:30,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4112980.0, ans=0.125 2024-08-18 21:40:42,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10200, loss[loss=0.1011, beats_loss=0.009184, ecapa_loss=0.0001571, whisper_loss=0.09035, over 19517.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001461, whisper_loss=0.09092, over 3940572.65 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:40:43,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4113080.0, ans=0.125 2024-08-18 21:40:45,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4113080.0, ans=0.125 2024-08-18 21:40:54,201 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-18 21:40:57,269 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.349e+01 2.556e+01 2.919e+01 5.340e+01, threshold=5.112e+01, percent-clipped=2.0 2024-08-18 21:41:01,718 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-18 21:41:15,329 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 21:41:16,534 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-18 21:41:19,191 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 21:41:19,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4113280.0, ans=0.125 2024-08-18 21:41:25,924 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-18 21:41:30,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.35 vs. limit=15.0 2024-08-18 21:41:36,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.99 vs. limit=22.5 2024-08-18 21:41:48,145 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-18 21:41:49,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10250, loss[loss=0.09951, beats_loss=0.0114, ecapa_loss=0.0001458, whisper_loss=0.08665, over 19948.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01041, ecapa_loss=0.0001448, whisper_loss=0.09116, over 3938277.86 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:41:57,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4113580.0, ans=0.5 2024-08-18 21:42:04,856 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-18 21:42:13,028 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2024-08-18 21:42:17,429 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 27 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 21:42:22,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4113780.0, ans=0.0 2024-08-18 21:42:28,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4113880.0, ans=0.1 2024-08-18 21:42:30,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=12.0 2024-08-18 21:42:45,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4113980.0, ans=0.125 2024-08-18 21:42:54,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10300, loss[loss=0.1052, beats_loss=0.01188, ecapa_loss=0.0001264, whisper_loss=0.09209, over 18826.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001452, whisper_loss=0.09067, over 3924089.01 frames. ], batch size: 74, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:43:08,361 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.322e+01 2.569e+01 2.887e+01 6.847e+01, threshold=5.137e+01, percent-clipped=1.0 2024-08-18 21:43:12,153 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 27 from Vox, 41 fro AS 2024-08-18 21:43:20,239 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-18 21:43:39,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4114380.0, ans=0.125 2024-08-18 21:43:46,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4114480.0, ans=0.0 2024-08-18 21:43:56,533 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-18 21:43:57,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10350, loss[loss=0.1127, beats_loss=0.01113, ecapa_loss=0.0001301, whisper_loss=0.1003, over 21688.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001449, whisper_loss=0.0903, over 3943392.86 frames. ], batch size: 84, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:44:03,300 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 21:44:08,851 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=12.0 2024-08-18 21:44:26,629 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 21:44:26,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4114780.0, ans=0.0 2024-08-18 21:44:41,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4114880.0, ans=0.2 2024-08-18 21:44:43,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4114880.0, ans=0.125 2024-08-18 21:44:45,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4114880.0, ans=0.2 2024-08-18 21:44:54,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4114980.0, ans=0.125 2024-08-18 21:45:03,375 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10400, loss[loss=0.1039, beats_loss=0.009735, ecapa_loss=0.000126, whisper_loss=0.09294, over 19463.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001446, whisper_loss=0.09104, over 3925422.23 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:45:17,376 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.312e+01 2.463e+01 2.686e+01 5.077e+01, threshold=4.927e+01, percent-clipped=0.0 2024-08-18 21:45:23,385 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 21:45:28,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4115280.0, ans=0.0 2024-08-18 21:45:40,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.90 vs. limit=22.5 2024-08-18 21:45:50,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4115380.0, ans=0.125 2024-08-18 21:46:08,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10450, loss[loss=0.09097, beats_loss=0.0118, ecapa_loss=0.0001699, whisper_loss=0.07747, over 21725.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001449, whisper_loss=0.09048, over 3905450.56 frames. ], batch size: 95, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:46:08,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4115580.0, ans=0.125 2024-08-18 21:46:08,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4115580.0, ans=0.0 2024-08-18 21:46:12,108 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-18 21:46:19,469 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 21:46:21,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4115680.0, ans=0.0 2024-08-18 21:46:22,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4115680.0, ans=0.125 2024-08-18 21:46:23,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2024-08-18 21:46:34,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4115780.0, ans=0.125 2024-08-18 21:46:45,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4115880.0, ans=0.125 2024-08-18 21:46:49,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4115880.0, ans=0.2 2024-08-18 21:47:11,855 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-18 21:47:14,642 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10500, loss[loss=0.08381, beats_loss=0.01231, ecapa_loss=0.0001396, whisper_loss=0.07011, over 21838.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001444, whisper_loss=0.09038, over 3928906.86 frames. ], batch size: 93, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:47:16,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4116080.0, ans=0.0 2024-08-18 21:47:29,475 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.013e+01 2.302e+01 2.513e+01 2.856e+01 4.600e+01, threshold=5.027e+01, percent-clipped=0.0 2024-08-18 21:47:31,568 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 21:48:05,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4116380.0, ans=0.125 2024-08-18 21:48:17,237 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-18 21:48:17,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4116480.0, ans=0.125 2024-08-18 21:48:25,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10550, loss[loss=0.117, beats_loss=0.007734, ecapa_loss=0.0001925, whisper_loss=0.1073, over 21280.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001436, whisper_loss=0.09021, over 3948121.83 frames. ], batch size: 88, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:48:26,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.73 vs. limit=15.0 2024-08-18 21:48:28,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4116580.0, ans=0.1 2024-08-18 21:48:28,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4116580.0, ans=0.125 2024-08-18 21:48:33,452 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-18 21:48:38,875 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 21:49:01,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4116780.0, ans=0.125 2024-08-18 21:49:19,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4116880.0, ans=0.0 2024-08-18 21:49:26,392 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 13 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-18 21:49:35,587 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10600, loss[loss=0.07951, beats_loss=0.01264, ecapa_loss=0.0001397, whisper_loss=0.06548, over 22234.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001429, whisper_loss=0.09011, over 3908643.04 frames. ], batch size: 94, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:49:50,614 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.667e+01 2.337e+01 2.542e+01 2.897e+01 3.687e+01, threshold=5.085e+01, percent-clipped=0.0 2024-08-18 21:50:08,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4117280.0, ans=0.125 2024-08-18 21:50:16,378 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 21:50:24,831 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 21:50:40,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4117480.0, ans=0.1 2024-08-18 21:50:43,764 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10650, loss[loss=0.1039, beats_loss=0.009937, ecapa_loss=0.000156, whisper_loss=0.09241, over 22404.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.000143, whisper_loss=0.09022, over 3902600.45 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:50:45,330 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-18 21:50:59,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4117680.0, ans=0.0 2024-08-18 21:51:03,670 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 21:51:12,831 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 21:51:33,079 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 21:51:33,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4117880.0, ans=0.1 2024-08-18 21:51:51,063 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10700, loss[loss=0.09963, beats_loss=0.01227, ecapa_loss=0.0001354, whisper_loss=0.08601, over 17188.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001427, whisper_loss=0.09053, over 3901179.62 frames. ], batch size: 70, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:52:05,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.628e+01 2.357e+01 2.585e+01 2.828e+01 1.514e+02, threshold=5.170e+01, percent-clipped=1.0 2024-08-18 21:52:12,049 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-18 21:52:14,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4118180.0, ans=0.07 2024-08-18 21:52:15,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4118280.0, ans=0.125 2024-08-18 21:52:22,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4118280.0, ans=0.2 2024-08-18 21:52:24,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=4118280.0, ans=0.02 2024-08-18 21:52:25,177 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 21:52:26,449 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-18 21:52:26,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4118280.0, ans=0.125 2024-08-18 21:52:30,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4118380.0, ans=0.2 2024-08-18 21:52:49,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4118480.0, ans=0.2 2024-08-18 21:52:56,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10750, loss[loss=0.09534, beats_loss=0.0103, ecapa_loss=0.0001513, whisper_loss=0.08353, over 18840.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001433, whisper_loss=0.09114, over 3884632.58 frames. ], batch size: 79, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:52:57,709 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 21:53:00,514 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-18 21:53:01,773 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-18 21:53:08,258 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-18 21:53:14,335 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 21:53:15,595 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 24 from Vox, 18 fro AS 2024-08-18 21:53:27,330 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 27 from Vox, 22 fro AS 2024-08-18 21:53:36,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4118880.0, ans=0.0 2024-08-18 21:53:36,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4118880.0, ans=0.125 2024-08-18 21:53:40,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=22.5 2024-08-18 21:53:49,591 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-18 21:53:53,472 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 21:54:00,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4119080.0, ans=0.0 2024-08-18 21:54:01,144 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10800, loss[loss=0.09094, beats_loss=0.009187, ecapa_loss=0.0001793, whisper_loss=0.07996, over 16238.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01049, ecapa_loss=0.0001433, whisper_loss=0.0906, over 3866380.80 frames. ], batch size: 69, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:54:15,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.266e+01 2.523e+01 2.854e+01 3.753e+01, threshold=5.046e+01, percent-clipped=0.0 2024-08-18 21:54:18,735 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2024-08-18 21:54:33,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.11 vs. limit=22.5 2024-08-18 21:54:38,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4119280.0, ans=0.2 2024-08-18 21:55:00,198 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4119480.0, ans=0.125 2024-08-18 21:55:06,182 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10850, loss[loss=0.1018, beats_loss=0.01034, ecapa_loss=0.00016, whisper_loss=0.08988, over 18753.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01052, ecapa_loss=0.0001435, whisper_loss=0.0909, over 3880217.69 frames. ], batch size: 78, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:55:07,752 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 21:55:08,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4119580.0, ans=0.125 2024-08-18 21:55:35,258 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-18 21:55:48,622 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.99 vs. limit=22.5 2024-08-18 21:55:53,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4119880.0, ans=0.1 2024-08-18 21:55:57,440 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-18 21:56:02,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4119980.0, ans=0.2 2024-08-18 21:56:02,991 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2024-08-18 21:56:13,959 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10900, loss[loss=0.08678, beats_loss=0.01137, ecapa_loss=0.0001516, whisper_loss=0.07389, over 17554.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001439, whisper_loss=0.09157, over 3938012.28 frames. ], batch size: 74, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:56:22,504 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4120080.0, ans=0.05 2024-08-18 21:56:26,992 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-18 21:56:27,995 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.367e+01 2.602e+01 2.908e+01 4.089e+01, threshold=5.204e+01, percent-clipped=0.0 2024-08-18 21:56:47,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.07 vs. limit=15.0 2024-08-18 21:57:05,188 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-18 21:57:16,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-18 21:57:17,738 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 21:57:19,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 10950, loss[loss=0.0955, beats_loss=0.009621, ecapa_loss=0.0001405, whisper_loss=0.08448, over 14125.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09156, over 3947916.68 frames. ], batch size: 54, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:57:26,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4120580.0, ans=0.0 2024-08-18 21:57:29,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4120580.0, ans=0.125 2024-08-18 21:57:38,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4120680.0, ans=0.0 2024-08-18 21:58:23,754 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11000, loss[loss=0.1311, beats_loss=0.006767, ecapa_loss=0.0001794, whisper_loss=0.1225, over 15542.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.0104, ecapa_loss=0.0001452, whisper_loss=0.09147, over 3943190.28 frames. ], batch size: 60, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:58:38,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.293e+01 2.499e+01 2.865e+01 3.776e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-18 21:58:57,530 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-18 21:59:02,817 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.452e+00 2024-08-18 21:59:07,833 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4121380.0, ans=0.125 2024-08-18 21:59:14,126 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 17 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-18 21:59:17,069 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-18 21:59:21,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4121480.0, ans=0.125 2024-08-18 21:59:26,582 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.93 vs. limit=22.5 2024-08-18 21:59:28,274 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11050, loss[loss=0.08283, beats_loss=0.01356, ecapa_loss=0.0001146, whisper_loss=0.06813, over 21318.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001444, whisper_loss=0.09087, over 3959116.83 frames. ], batch size: 87, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 21:59:31,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4121580.0, ans=0.1 2024-08-18 21:59:32,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4121580.0, ans=0.2 2024-08-18 21:59:49,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4121680.0, ans=0.09899494936611666 2024-08-18 21:59:51,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4121680.0, ans=0.125 2024-08-18 21:59:58,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4121780.0, ans=0.2 2024-08-18 22:00:22,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4121980.0, ans=0.0 2024-08-18 22:00:23,209 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-18 22:00:24,602 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-18 22:00:24,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4121980.0, ans=0.125 2024-08-18 22:00:33,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11100, loss[loss=0.1083, beats_loss=0.01017, ecapa_loss=0.0001328, whisper_loss=0.09685, over 23282.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0105, ecapa_loss=0.0001443, whisper_loss=0.09093, over 3968333.64 frames. ], batch size: 92, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:00:43,892 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-08-18 22:00:47,896 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.978e+01 2.366e+01 2.564e+01 2.884e+01 4.228e+01, threshold=5.128e+01, percent-clipped=0.0 2024-08-18 22:00:55,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4122180.0, ans=0.125 2024-08-18 22:00:56,717 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.63 vs. limit=12.0 2024-08-18 22:00:58,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4122280.0, ans=0.2 2024-08-18 22:01:30,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4122480.0, ans=0.125 2024-08-18 22:01:30,718 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=15.0 2024-08-18 22:01:38,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4122580.0, ans=0.125 2024-08-18 22:01:39,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11150, loss[loss=0.09594, beats_loss=0.01292, ecapa_loss=0.0001171, whisper_loss=0.08185, over 20336.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001438, whisper_loss=0.09132, over 3971688.99 frames. ], batch size: 79, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:01:42,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4122580.0, ans=0.125 2024-08-18 22:01:42,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2024-08-18 22:01:55,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4122680.0, ans=0.1 2024-08-18 22:01:55,978 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-18 22:01:57,237 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-18 22:01:57,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4122680.0, ans=0.0 2024-08-18 22:02:21,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4122880.0, ans=0.1 2024-08-18 22:02:26,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4122880.0, ans=0.0 2024-08-18 22:02:37,743 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 22:02:41,471 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 22:02:43,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11200, loss[loss=0.1043, beats_loss=0.01048, ecapa_loss=0.0001336, whisper_loss=0.09244, over 22145.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.0001445, whisper_loss=0.09123, over 3943402.64 frames. ], batch size: 91, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:02:58,230 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.383e+01 2.625e+01 2.814e+01 6.266e+01, threshold=5.250e+01, percent-clipped=1.0 2024-08-18 22:03:04,790 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-18 22:03:23,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.59 vs. limit=15.0 2024-08-18 22:03:30,969 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-18 22:03:34,887 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-18 22:03:41,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4123480.0, ans=0.2 2024-08-18 22:03:48,700 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11250, loss[loss=0.08368, beats_loss=0.0123, ecapa_loss=0.0001281, whisper_loss=0.0701, over 13722.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001443, whisper_loss=0.09148, over 3923640.41 frames. ], batch size: 56, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:04:02,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4123680.0, ans=0.0 2024-08-18 22:04:21,512 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-18 22:04:28,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4123880.0, ans=0.1 2024-08-18 22:04:40,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4123980.0, ans=0.0 2024-08-18 22:04:44,919 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 28 from Vox, 32 fro AS 2024-08-18 22:04:53,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11300, loss[loss=0.09524, beats_loss=0.01233, ecapa_loss=0.0001256, whisper_loss=0.08166, over 16269.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001437, whisper_loss=0.09099, over 3931790.81 frames. ], batch size: 64, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:04:56,579 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-18 22:04:59,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4124080.0, ans=0.125 2024-08-18 22:05:05,214 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 22:05:07,793 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.238e+01 2.501e+01 2.783e+01 2.394e+02, threshold=5.001e+01, percent-clipped=1.0 2024-08-18 22:05:18,297 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-18 22:05:43,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4124380.0, ans=0.125 2024-08-18 22:05:54,873 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-18 22:05:58,561 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11350, loss[loss=0.1027, beats_loss=0.009453, ecapa_loss=0.0001558, whisper_loss=0.09167, over 20308.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.09017, over 3918795.45 frames. ], batch size: 77, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:06:19,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2024-08-18 22:06:21,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4124680.0, ans=0.2 2024-08-18 22:06:26,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4124780.0, ans=0.0 2024-08-18 22:06:46,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4124880.0, ans=0.1 2024-08-18 22:07:03,789 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-18 22:07:04,812 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11400, loss[loss=0.1008, beats_loss=0.01094, ecapa_loss=0.0001336, whisper_loss=0.08853, over 16059.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001425, whisper_loss=0.09063, over 3893729.39 frames. ], batch size: 63, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:07:11,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4125080.0, ans=0.0 2024-08-18 22:07:18,933 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.937e+01 2.399e+01 2.610e+01 2.996e+01 4.711e+01, threshold=5.221e+01, percent-clipped=0.0 2024-08-18 22:07:21,663 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 9 from Vox, 29 fro AS 2024-08-18 22:07:43,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4125380.0, ans=0.125 2024-08-18 22:08:05,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.53 vs. limit=10.0 2024-08-18 22:08:09,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11450, loss[loss=0.119, beats_loss=0.008616, ecapa_loss=0.0001637, whisper_loss=0.1088, over 18409.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001431, whisper_loss=0.09041, over 3888599.01 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:08:10,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4125580.0, ans=0.0 2024-08-18 22:08:12,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4125580.0, ans=0.1 2024-08-18 22:08:13,640 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-18 22:08:39,938 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 22:08:59,611 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 30 from Vox, 32 fro AS 2024-08-18 22:09:00,957 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 22:09:03,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4125980.0, ans=0.1 2024-08-18 22:09:03,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4125980.0, ans=0.0 2024-08-18 22:09:09,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2024-08-18 22:09:11,187 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 22:09:13,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.55 vs. limit=15.0 2024-08-18 22:09:15,161 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11500, loss[loss=0.07783, beats_loss=0.01219, ecapa_loss=0.0001584, whisper_loss=0.06406, over 17480.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001424, whisper_loss=0.09001, over 3910752.61 frames. ], batch size: 75, lr: 2.18e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:09:29,383 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.741e+01 2.337e+01 2.548e+01 2.823e+01 1.618e+02, threshold=5.097e+01, percent-clipped=1.0 2024-08-18 22:09:35,228 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 22:09:35,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4126180.0, ans=0.1 2024-08-18 22:10:10,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4126480.0, ans=0.0 2024-08-18 22:10:18,676 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 22:10:21,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11550, loss[loss=0.1182, beats_loss=0.01018, ecapa_loss=0.0001454, whisper_loss=0.1066, over 22153.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.0001423, whisper_loss=0.09103, over 3928864.15 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:10:21,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4126580.0, ans=0.2 2024-08-18 22:10:28,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.27 vs. limit=22.5 2024-08-18 22:10:48,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4126780.0, ans=0.05 2024-08-18 22:10:48,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4126780.0, ans=0.0 2024-08-18 22:10:57,852 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 22:10:58,958 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-18 22:11:04,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4126880.0, ans=0.2 2024-08-18 22:11:23,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4126980.0, ans=0.125 2024-08-18 22:11:23,474 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-18 22:11:29,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11600, loss[loss=0.09027, beats_loss=0.01356, ecapa_loss=0.0001188, whisper_loss=0.07552, over 16942.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.000142, whisper_loss=0.09076, over 3902128.78 frames. ], batch size: 67, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:11:29,638 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-18 22:11:30,906 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 22:11:44,869 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.392e+01 2.631e+01 2.895e+01 1.114e+02, threshold=5.261e+01, percent-clipped=3.0 2024-08-18 22:12:19,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.34 vs. limit=15.0 2024-08-18 22:12:23,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4127380.0, ans=0.2 2024-08-18 22:12:23,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4127380.0, ans=0.125 2024-08-18 22:12:31,741 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 33 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-18 22:12:35,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2024-08-18 22:12:39,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11650, loss[loss=0.07178, beats_loss=0.01056, ecapa_loss=0.0001214, whisper_loss=0.06, over 14254.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09129, over 3892926.20 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:12:40,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4127580.0, ans=0.125 2024-08-18 22:12:56,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4127680.0, ans=0.0 2024-08-18 22:13:00,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4127680.0, ans=0.125 2024-08-18 22:13:00,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4127680.0, ans=0.2 2024-08-18 22:13:00,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4127680.0, ans=0.125 2024-08-18 22:13:07,024 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 22:13:10,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4127780.0, ans=0.2 2024-08-18 22:13:24,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4127880.0, ans=0.1 2024-08-18 22:13:24,463 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.018e+00 2024-08-18 22:13:39,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.08 vs. limit=15.0 2024-08-18 22:13:43,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11700, loss[loss=0.09603, beats_loss=0.009135, ecapa_loss=0.0001864, whisper_loss=0.08503, over 19417.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001418, whisper_loss=0.09044, over 3888015.44 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 22:13:47,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4128080.0, ans=0.125 2024-08-18 22:13:53,168 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 13 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 22:13:58,082 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.372e+01 2.671e+01 2.866e+01 2.586e+02, threshold=5.342e+01, percent-clipped=1.0 2024-08-18 22:14:08,758 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-18 22:14:11,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2024-08-18 22:14:20,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4128280.0, ans=0.1 2024-08-18 22:14:24,046 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 22:14:25,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4128380.0, ans=0.0 2024-08-18 22:14:35,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4128480.0, ans=0.125 2024-08-18 22:14:36,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4128480.0, ans=0.0 2024-08-18 22:14:39,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4128480.0, ans=0.125 2024-08-18 22:14:45,402 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 22:14:47,899 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11750, loss[loss=0.1064, beats_loss=0.01001, ecapa_loss=0.0001401, whisper_loss=0.09498, over 16845.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.09127, over 3931752.40 frames. ], batch size: 65, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:14:49,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4128580.0, ans=0.1 2024-08-18 22:14:51,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4128580.0, ans=0.125 2024-08-18 22:14:55,561 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-18 22:15:01,774 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 22:15:10,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4128680.0, ans=0.125 2024-08-18 22:15:19,532 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-18 22:15:38,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4128980.0, ans=0.125 2024-08-18 22:15:43,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=4128980.0, ans=0.2 2024-08-18 22:15:50,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11800, loss[loss=0.08892, beats_loss=0.01014, ecapa_loss=0.0001474, whisper_loss=0.07731, over 22227.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001426, whisper_loss=0.0913, over 3898644.54 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:16:01,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4129080.0, ans=0.0 2024-08-18 22:16:03,752 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-18 22:16:06,192 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.286e+01 2.543e+01 2.807e+01 3.749e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-18 22:16:06,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4129180.0, ans=0.1 2024-08-18 22:16:06,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4129180.0, ans=0.125 2024-08-18 22:16:18,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4129280.0, ans=0.1 2024-08-18 22:16:19,198 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-18 22:16:25,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4129280.0, ans=0.0 2024-08-18 22:16:27,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4129280.0, ans=0.125 2024-08-18 22:16:32,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4129380.0, ans=0.2 2024-08-18 22:16:38,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4129380.0, ans=0.1 2024-08-18 22:16:55,186 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11850, loss[loss=0.1094, beats_loss=0.01085, ecapa_loss=0.0001092, whisper_loss=0.09745, over 20491.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.09102, over 3900003.38 frames. ], batch size: 76, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:17:00,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4129580.0, ans=0.0 2024-08-18 22:17:01,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4129580.0, ans=0.2 2024-08-18 22:17:05,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4129580.0, ans=0.125 2024-08-18 22:17:20,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4129780.0, ans=0.1 2024-08-18 22:17:30,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4129780.0, ans=0.2 2024-08-18 22:17:37,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4129880.0, ans=0.0 2024-08-18 22:17:45,469 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 12 from Vox, 24 fro AS 2024-08-18 22:17:51,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4129980.0, ans=0.125 2024-08-18 22:17:59,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11900, loss[loss=0.1013, beats_loss=0.0113, ecapa_loss=0.0001258, whisper_loss=0.08879, over 21994.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01056, ecapa_loss=0.0001429, whisper_loss=0.09061, over 3896275.58 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:18:05,884 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4130080.0, ans=0.2 2024-08-18 22:18:14,415 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.051e+01 2.305e+01 2.478e+01 2.772e+01 3.965e+01, threshold=4.956e+01, percent-clipped=0.0 2024-08-18 22:18:30,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4130280.0, ans=0.125 2024-08-18 22:18:44,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4130380.0, ans=0.125 2024-08-18 22:18:50,979 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-18 22:18:56,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4130480.0, ans=0.125 2024-08-18 22:18:57,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4130480.0, ans=0.0 2024-08-18 22:19:03,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 11950, loss[loss=0.08907, beats_loss=0.01042, ecapa_loss=0.0001254, whisper_loss=0.0774, over 16011.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01049, ecapa_loss=0.0001433, whisper_loss=0.09071, over 3874191.70 frames. ], batch size: 62, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:19:09,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4130580.0, ans=0.2 2024-08-18 22:19:13,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4130580.0, ans=0.125 2024-08-18 22:19:23,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4130680.0, ans=0.0 2024-08-18 22:19:24,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4130680.0, ans=0.1 2024-08-18 22:19:26,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-08-18 22:19:28,033 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-18 22:19:58,074 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 22:19:59,235 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-18 22:20:01,889 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 22:20:06,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12000, loss[loss=0.1164, beats_loss=0.007496, ecapa_loss=0.0001265, whisper_loss=0.1076, over 15302.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09033, over 3883283.95 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:20:06,934 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 22:20:47,562 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on ASR_libri: loss=0.2535, beats_loss=0, ecapa_loss=0.0005126, whisper_loss=0.2484, over 922467.00 frames. 2024-08-18 22:21:06,659 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on SV_voxceleb1: loss=0.004051, beats_loss=0, ecapa_loss=0.0004051, whisper_loss=0, over 939242.00 frames. 2024-08-18 22:22:55,542 INFO [train_multi_KD3.py:1149] (3/4) Epoch 28, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 22:22:55,546 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 22:23:11,594 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.291e+01 2.547e+01 2.883e+01 4.329e+01, threshold=5.094e+01, percent-clipped=0.0 2024-08-18 22:23:11,790 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-18 22:23:13,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4131180.0, ans=0.125 2024-08-18 22:23:13,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4131180.0, ans=0.1 2024-08-18 22:23:25,744 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 22:23:49,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4131480.0, ans=0.1 2024-08-18 22:23:57,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4131480.0, ans=0.0 2024-08-18 22:23:59,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12050, loss[loss=0.1073, beats_loss=0.008639, ecapa_loss=0.0001571, whisper_loss=0.09707, over 15024.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001458, whisper_loss=0.09044, over 3873130.45 frames. ], batch size: 59, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:24:02,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4131580.0, ans=0.125 2024-08-18 22:24:08,802 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 22:24:26,590 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 22:24:33,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4131780.0, ans=0.125 2024-08-18 22:24:49,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4131980.0, ans=0.125 2024-08-18 22:24:50,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4131980.0, ans=0.0 2024-08-18 22:25:03,168 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12100, loss[loss=0.0958, beats_loss=0.01163, ecapa_loss=0.0001896, whisper_loss=0.08228, over 20489.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001446, whisper_loss=0.09012, over 3901859.13 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:25:17,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.763e+01 2.242e+01 2.477e+01 2.714e+01 3.521e+01, threshold=4.955e+01, percent-clipped=0.0 2024-08-18 22:25:22,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4132180.0, ans=0.1 2024-08-18 22:25:39,685 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.30 vs. limit=22.5 2024-08-18 22:25:43,745 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-18 22:25:44,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.51 vs. limit=15.0 2024-08-18 22:25:49,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.93 vs. limit=22.5 2024-08-18 22:25:51,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4132380.0, ans=0.0 2024-08-18 22:25:54,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4132480.0, ans=0.0 2024-08-18 22:26:09,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12150, loss[loss=0.08409, beats_loss=0.01152, ecapa_loss=0.000157, whisper_loss=0.07101, over 17654.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001447, whisper_loss=0.08944, over 3847353.38 frames. ], batch size: 75, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:26:12,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4132580.0, ans=0.0 2024-08-18 22:26:15,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4132580.0, ans=0.2 2024-08-18 22:26:46,633 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:26:46,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4132780.0, ans=0.0 2024-08-18 22:26:57,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4132780.0, ans=0.0 2024-08-18 22:27:01,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4132880.0, ans=0.0 2024-08-18 22:27:09,428 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-18 22:27:30,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4132980.0, ans=0.125 2024-08-18 22:27:40,522 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12200, loss[loss=0.08246, beats_loss=0.01161, ecapa_loss=0.0002076, whisper_loss=0.06877, over 21825.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001435, whisper_loss=0.08925, over 3837874.50 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:27:51,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4133080.0, ans=0.2 2024-08-18 22:28:04,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4133180.0, ans=0.125 2024-08-18 22:28:05,923 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 31 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 22:28:07,544 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.413e+01 2.660e+01 2.941e+01 4.822e+01, threshold=5.320e+01, percent-clipped=0.0 2024-08-18 22:28:19,375 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 22:28:28,229 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 22:28:31,431 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-18 22:28:36,286 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4133280.0, ans=0.125 2024-08-18 22:28:45,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2024-08-18 22:29:06,410 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-18 22:29:19,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=4133480.0, ans=22.5 2024-08-18 22:29:24,088 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 33 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-18 22:29:28,665 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12250, loss[loss=0.08752, beats_loss=0.01115, ecapa_loss=0.0001403, whisper_loss=0.07497, over 18333.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001425, whisper_loss=0.08962, over 3876107.60 frames. ], batch size: 73, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:29:55,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4133680.0, ans=0.125 2024-08-18 22:30:25,090 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 22:30:34,170 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-18 22:30:48,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4133980.0, ans=0.05 2024-08-18 22:30:59,618 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-18 22:31:02,380 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12300, loss[loss=0.0983, beats_loss=0.01062, ecapa_loss=0.0001428, whisper_loss=0.08625, over 18246.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001428, whisper_loss=0.08951, over 3889572.77 frames. ], batch size: 73, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:31:08,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4134080.0, ans=0.125 2024-08-18 22:31:19,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.834e+01 2.309e+01 2.558e+01 2.931e+01 4.229e+01, threshold=5.115e+01, percent-clipped=0.0 2024-08-18 22:31:54,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4134380.0, ans=0.125 2024-08-18 22:32:13,369 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 22:32:16,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12350, loss[loss=0.09876, beats_loss=0.01014, ecapa_loss=0.0001318, whisper_loss=0.0873, over 21846.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01052, ecapa_loss=0.0001438, whisper_loss=0.08978, over 3885148.60 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:32:48,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4134780.0, ans=0.1 2024-08-18 22:32:49,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4134780.0, ans=0.1 2024-08-18 22:32:55,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4134780.0, ans=0.04949747468305833 2024-08-18 22:33:30,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.34 vs. limit=22.5 2024-08-18 22:33:33,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12400, loss[loss=0.1042, beats_loss=0.007899, ecapa_loss=0.0001643, whisper_loss=0.09469, over 17491.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.0001445, whisper_loss=0.08973, over 3864171.39 frames. ], batch size: 69, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:33:40,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4135080.0, ans=0.2 2024-08-18 22:33:52,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.330e+01 2.647e+01 2.859e+01 4.171e+01, threshold=5.294e+01, percent-clipped=0.0 2024-08-18 22:34:44,286 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2024-08-18 22:34:45,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4135480.0, ans=0.125 2024-08-18 22:34:47,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4135480.0, ans=0.07 2024-08-18 22:34:47,960 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 22:34:50,930 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12450, loss[loss=0.09871, beats_loss=0.01241, ecapa_loss=0.0001357, whisper_loss=0.08495, over 18115.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01038, ecapa_loss=0.0001455, whisper_loss=0.09015, over 3860476.27 frames. ], batch size: 73, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:34:54,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4135580.0, ans=0.125 2024-08-18 22:35:01,660 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-18 22:35:12,307 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 15 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-18 22:35:20,974 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-18 22:35:46,838 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-18 22:35:50,673 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-18 22:36:05,303 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-18 22:36:07,681 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12500, loss[loss=0.1014, beats_loss=0.009046, ecapa_loss=0.0001749, whisper_loss=0.09061, over 15345.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.000144, whisper_loss=0.09024, over 3894312.48 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:36:12,530 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-18 22:36:25,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.938e+01 2.340e+01 2.580e+01 2.958e+01 4.069e+01, threshold=5.161e+01, percent-clipped=0.0 2024-08-18 22:36:27,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4136180.0, ans=0.125 2024-08-18 22:36:32,999 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.79 vs. limit=12.0 2024-08-18 22:36:49,770 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-18 22:36:56,546 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2024-08-18 22:37:02,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2024-08-18 22:37:03,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4136380.0, ans=0.125 2024-08-18 22:37:05,512 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-18 22:37:08,006 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.22 vs. limit=15.0 2024-08-18 22:37:21,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12550, loss[loss=0.1018, beats_loss=0.01225, ecapa_loss=0.0001425, whisper_loss=0.08816, over 23248.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.0001447, whisper_loss=0.09051, over 3909373.69 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:37:21,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4136580.0, ans=0.125 2024-08-18 22:37:23,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4136580.0, ans=0.125 2024-08-18 22:37:31,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4136580.0, ans=0.0 2024-08-18 22:37:35,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2024-08-18 22:37:38,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4136680.0, ans=0.1 2024-08-18 22:37:42,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4136680.0, ans=0.125 2024-08-18 22:37:45,298 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 22:37:45,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4136680.0, ans=0.125 2024-08-18 22:37:53,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-18 22:37:54,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4136780.0, ans=0.0 2024-08-18 22:37:59,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4136780.0, ans=0.035 2024-08-18 22:37:59,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2024-08-18 22:38:04,302 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-18 22:38:13,927 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 22:38:18,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4136980.0, ans=0.1 2024-08-18 22:38:30,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12600, loss[loss=0.09507, beats_loss=0.009066, ecapa_loss=0.0002173, whisper_loss=0.08383, over 12351.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001442, whisper_loss=0.09035, over 3897515.21 frames. ], batch size: 54, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:38:36,066 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.96 vs. limit=22.5 2024-08-18 22:38:39,987 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=15.0 2024-08-18 22:38:43,475 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 24 from LS+wenet, 9 from Vox, 22 fro AS 2024-08-18 22:38:47,429 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.906e+01 2.167e+01 2.472e+01 2.650e+01 1.085e+02, threshold=4.945e+01, percent-clipped=2.0 2024-08-18 22:38:47,711 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-18 22:38:57,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4137280.0, ans=0.125 2024-08-18 22:38:57,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4137280.0, ans=10.0 2024-08-18 22:39:01,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4137280.0, ans=0.025 2024-08-18 22:39:26,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2024-08-18 22:39:30,531 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-18 22:39:37,671 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12650, loss[loss=0.08743, beats_loss=0.01254, ecapa_loss=0.0001355, whisper_loss=0.07354, over 19221.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01064, ecapa_loss=0.0001437, whisper_loss=0.09037, over 3869517.64 frames. ], batch size: 79, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:39:39,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4137580.0, ans=0.1 2024-08-18 22:39:47,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4137580.0, ans=0.125 2024-08-18 22:39:51,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4137680.0, ans=0.1 2024-08-18 22:40:32,186 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-18 22:40:44,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4138080.0, ans=0.125 2024-08-18 22:40:45,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12700, loss[loss=0.08868, beats_loss=0.01064, ecapa_loss=0.0001383, whisper_loss=0.07666, over 17215.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001432, whisper_loss=0.0902, over 3875986.89 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:41:01,297 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 9 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-18 22:41:03,219 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.392e+01 2.657e+01 3.204e+01 3.516e+02, threshold=5.313e+01, percent-clipped=1.0 2024-08-18 22:41:15,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4138280.0, ans=0.0 2024-08-18 22:41:15,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4138280.0, ans=0.125 2024-08-18 22:41:45,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4138480.0, ans=10.0 2024-08-18 22:41:52,185 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:41:55,493 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12750, loss[loss=0.09371, beats_loss=0.008359, ecapa_loss=0.0001394, whisper_loss=0.08395, over 14982.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0106, ecapa_loss=0.0001436, whisper_loss=0.09035, over 3885022.93 frames. ], batch size: 57, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:41:57,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4138580.0, ans=0.125 2024-08-18 22:41:59,523 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 17 from LS+wenet, 15 from Vox, 52 fro AS 2024-08-18 22:42:08,239 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=9.144e+00 2024-08-18 22:42:21,400 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-18 22:42:30,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.27 vs. limit=6.0 2024-08-18 22:42:40,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4138880.0, ans=0.0 2024-08-18 22:43:04,012 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12800, loss[loss=0.1108, beats_loss=0.01193, ecapa_loss=0.0001501, whisper_loss=0.09742, over 22817.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.0001453, whisper_loss=0.08988, over 3867792.83 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:43:16,111 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2024-08-18 22:43:20,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.353e+01 2.654e+01 3.011e+01 1.383e+02, threshold=5.309e+01, percent-clipped=3.0 2024-08-18 22:43:33,371 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-18 22:43:41,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4139280.0, ans=0.125 2024-08-18 22:43:55,952 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-18 22:44:01,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4139480.0, ans=0.125 2024-08-18 22:44:02,022 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-08-18 22:44:09,094 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-18 22:44:10,623 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12850, loss[loss=0.08261, beats_loss=0.01142, ecapa_loss=0.0001256, whisper_loss=0.06994, over 16491.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001442, whisper_loss=0.08989, over 3860571.00 frames. ], batch size: 66, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:44:11,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4139580.0, ans=0.2 2024-08-18 22:45:01,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.49 vs. limit=6.0 2024-08-18 22:45:11,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-18 22:45:12,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4139980.0, ans=0.07 2024-08-18 22:45:18,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12900, loss[loss=0.117, beats_loss=0.009444, ecapa_loss=0.0001684, whisper_loss=0.1059, over 21966.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001444, whisper_loss=0.09018, over 3867502.56 frames. ], batch size: 91, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:45:25,132 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4140080.0, ans=0.125 2024-08-18 22:45:35,660 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.698e+01 2.228e+01 2.406e+01 2.715e+01 2.764e+02, threshold=4.812e+01, percent-clipped=1.0 2024-08-18 22:45:43,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4140180.0, ans=0.95 2024-08-18 22:45:44,203 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-18 22:45:50,623 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 22:45:58,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.69 vs. limit=22.5 2024-08-18 22:46:00,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4140380.0, ans=0.015 2024-08-18 22:46:05,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4140380.0, ans=0.125 2024-08-18 22:46:09,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4140380.0, ans=0.0 2024-08-18 22:46:17,363 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-18 22:46:27,714 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 12950, loss[loss=0.1022, beats_loss=0.00917, ecapa_loss=0.0001185, whisper_loss=0.09183, over 17460.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.000145, whisper_loss=0.08971, over 3862994.95 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:46:49,108 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 26 from LS+wenet, 14 from Vox, 18 fro AS 2024-08-18 22:46:52,809 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-18 22:47:19,891 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-18 22:47:35,813 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13000, loss[loss=0.1111, beats_loss=0.01046, ecapa_loss=0.0001284, whisper_loss=0.09938, over 17482.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001455, whisper_loss=0.09042, over 3847408.12 frames. ], batch size: 67, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:47:38,580 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 18 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 22:47:41,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4141080.0, ans=0.0 2024-08-18 22:47:45,314 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-18 22:47:47,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4141080.0, ans=0.2 2024-08-18 22:47:48,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4141180.0, ans=0.0 2024-08-18 22:47:51,956 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.390e+01 2.751e+01 3.227e+01 4.665e+01, threshold=5.502e+01, percent-clipped=0.0 2024-08-18 22:47:56,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4141180.0, ans=0.2 2024-08-18 22:48:02,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4141280.0, ans=0.0 2024-08-18 22:48:04,087 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-18 22:48:14,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2024-08-18 22:48:15,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.58 vs. limit=15.0 2024-08-18 22:48:16,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=4141380.0, ans=0.025 2024-08-18 22:48:39,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4141480.0, ans=0.125 2024-08-18 22:48:42,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.22 vs. limit=22.5 2024-08-18 22:48:44,335 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13050, loss[loss=0.1147, beats_loss=0.01048, ecapa_loss=0.0001101, whisper_loss=0.1031, over 24066.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.000143, whisper_loss=0.09077, over 3868078.09 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:48:46,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=4141580.0, ans=15.0 2024-08-18 22:48:59,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4141680.0, ans=10.0 2024-08-18 22:49:01,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4141680.0, ans=0.125 2024-08-18 22:49:06,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4141680.0, ans=0.125 2024-08-18 22:49:13,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4141780.0, ans=0.125 2024-08-18 22:49:36,118 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 17 from Vox, 52 fro AS 2024-08-18 22:49:55,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13100, loss[loss=0.07267, beats_loss=0.01111, ecapa_loss=0.0001457, whisper_loss=0.0601, over 13873.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01042, ecapa_loss=0.0001428, whisper_loss=0.09068, over 3836089.57 frames. ], batch size: 56, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:50:05,990 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.30 vs. limit=15.0 2024-08-18 22:50:08,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4142080.0, ans=0.125 2024-08-18 22:50:13,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4142180.0, ans=0.125 2024-08-18 22:50:14,316 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.842e+01 2.210e+01 2.445e+01 2.730e+01 3.721e+01, threshold=4.891e+01, percent-clipped=0.0 2024-08-18 22:50:22,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-08-18 22:50:28,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4142280.0, ans=0.0 2024-08-18 22:50:31,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4142280.0, ans=0.2 2024-08-18 22:50:38,799 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 22:50:45,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4142380.0, ans=0.2 2024-08-18 22:50:52,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4142380.0, ans=0.0 2024-08-18 22:50:53,818 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 22:51:09,001 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 15 from LS+wenet, 28 from Vox, 45 fro AS 2024-08-18 22:51:10,357 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13150, loss[loss=0.07013, beats_loss=0.01386, ecapa_loss=0.0001469, whisper_loss=0.0548, over 20289.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001422, whisper_loss=0.09037, over 3860522.03 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:51:13,777 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-18 22:51:15,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.79 vs. limit=22.5 2024-08-18 22:51:23,641 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 10 from Vox, 31 fro AS 2024-08-18 22:51:26,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4142680.0, ans=0.0 2024-08-18 22:51:42,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4142780.0, ans=0.5 2024-08-18 22:52:11,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4142980.0, ans=0.125 2024-08-18 22:52:23,477 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13200, loss[loss=0.08133, beats_loss=0.01229, ecapa_loss=0.0001424, whisper_loss=0.06761, over 21105.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001434, whisper_loss=0.09053, over 3833964.77 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:52:23,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4143080.0, ans=0.125 2024-08-18 22:52:29,241 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-18 22:52:29,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4143080.0, ans=0.1 2024-08-18 22:52:29,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4143080.0, ans=0.0 2024-08-18 22:52:32,437 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-18 22:52:39,721 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.334e+01 2.573e+01 2.889e+01 3.948e+01, threshold=5.147e+01, percent-clipped=0.0 2024-08-18 22:52:51,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4143280.0, ans=0.0 2024-08-18 22:52:59,633 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4143280.0, ans=0.04949747468305833 2024-08-18 22:53:15,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4143380.0, ans=0.2 2024-08-18 22:53:17,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4143380.0, ans=0.125 2024-08-18 22:53:23,846 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 22:53:29,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4143480.0, ans=0.04949747468305833 2024-08-18 22:53:32,942 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13250, loss[loss=0.1176, beats_loss=0.01047, ecapa_loss=0.0001154, whisper_loss=0.106, over 20868.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01041, ecapa_loss=0.000143, whisper_loss=0.09094, over 3813150.99 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:53:41,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4143580.0, ans=0.125 2024-08-18 22:54:08,001 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=15.0 2024-08-18 22:54:08,561 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-18 22:54:12,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4143780.0, ans=0.125 2024-08-18 22:54:16,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4143880.0, ans=10.0 2024-08-18 22:54:18,735 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 17 from Vox, 17 fro AS 2024-08-18 22:54:32,750 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-18 22:54:41,840 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 24 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-18 22:54:46,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13300, loss[loss=0.1072, beats_loss=0.009549, ecapa_loss=0.0001462, whisper_loss=0.09618, over 20242.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01032, ecapa_loss=0.0001434, whisper_loss=0.09139, over 3832562.12 frames. ], batch size: 81, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:54:50,028 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-18 22:55:02,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4144180.0, ans=0.07 2024-08-18 22:55:02,712 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.713e+01 2.463e+01 2.740e+01 3.023e+01 1.147e+02, threshold=5.480e+01, percent-clipped=2.0 2024-08-18 22:55:15,937 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-18 22:55:35,610 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 15 from LS+wenet, 26 from Vox, 19 fro AS 2024-08-18 22:55:36,208 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2024-08-18 22:55:49,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4144480.0, ans=0.0 2024-08-18 22:55:56,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4144480.0, ans=0.125 2024-08-18 22:55:59,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13350, loss[loss=0.1264, beats_loss=0.009499, ecapa_loss=9.42e-05, whisper_loss=0.1159, over 17900.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001435, whisper_loss=0.09087, over 3835714.73 frames. ], batch size: 63, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:55:59,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4144580.0, ans=0.1 2024-08-18 22:56:06,222 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-18 22:56:15,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4144680.0, ans=0.0 2024-08-18 22:56:28,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4144780.0, ans=0.1 2024-08-18 22:56:29,860 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 22:56:33,248 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-08-18 22:56:39,797 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 22:56:40,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4144880.0, ans=0.125 2024-08-18 22:56:47,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4144880.0, ans=0.2 2024-08-18 22:56:48,195 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-08-18 22:56:57,951 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-08-18 22:57:07,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4144980.0, ans=0.125 2024-08-18 22:57:09,993 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13400, loss[loss=0.117, beats_loss=0.009064, ecapa_loss=0.0001813, whisper_loss=0.1062, over 16991.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.0104, ecapa_loss=0.0001439, whisper_loss=0.09059, over 3852147.97 frames. ], batch size: 69, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:57:22,097 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-18 22:57:25,917 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.309e+01 2.613e+01 2.893e+01 4.161e+01, threshold=5.227e+01, percent-clipped=0.0 2024-08-18 22:57:32,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4145180.0, ans=0.125 2024-08-18 22:57:49,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4145280.0, ans=0.0 2024-08-18 22:57:55,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4145380.0, ans=0.1 2024-08-18 22:58:02,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4145380.0, ans=0.1 2024-08-18 22:58:03,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4145480.0, ans=0.0 2024-08-18 22:58:06,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.35 vs. limit=15.0 2024-08-18 22:58:17,576 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13450, loss[loss=0.1129, beats_loss=0.01185, ecapa_loss=0.0001562, whisper_loss=0.09947, over 22121.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001437, whisper_loss=0.0901, over 3834291.78 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:58:22,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4145580.0, ans=0.125 2024-08-18 22:58:33,630 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-18 22:58:36,630 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4145680.0, ans=0.125 2024-08-18 22:58:36,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4145680.0, ans=10.0 2024-08-18 22:58:38,937 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 22:58:39,641 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.028e+01 2024-08-18 22:58:40,523 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-18 22:58:50,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.53 vs. limit=15.0 2024-08-18 22:58:53,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4145780.0, ans=0.0 2024-08-18 22:58:53,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4145780.0, ans=0.125 2024-08-18 22:58:57,513 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-18 22:58:57,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4145880.0, ans=0.125 2024-08-18 22:59:25,125 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13500, loss[loss=0.1073, beats_loss=0.009097, ecapa_loss=0.0001822, whisper_loss=0.09638, over 21649.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01036, ecapa_loss=0.0001439, whisper_loss=0.09071, over 3858131.51 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 22:59:42,038 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.292e+01 2.450e+01 2.682e+01 3.330e+01, threshold=4.901e+01, percent-clipped=0.0 2024-08-18 22:59:47,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4146180.0, ans=0.125 2024-08-18 22:59:52,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4146280.0, ans=0.125 2024-08-18 22:59:58,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4146280.0, ans=0.125 2024-08-18 23:00:03,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4146280.0, ans=0.2 2024-08-18 23:00:09,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4146380.0, ans=0.125 2024-08-18 23:00:32,317 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13550, loss[loss=0.09601, beats_loss=0.009808, ecapa_loss=0.0001469, whisper_loss=0.08473, over 14970.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001444, whisper_loss=0.0901, over 3838022.49 frames. ], batch size: 61, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:00:49,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4146680.0, ans=0.0 2024-08-18 23:00:52,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4146680.0, ans=0.0 2024-08-18 23:01:24,313 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 23:01:28,230 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-18 23:01:41,365 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13600, loss[loss=0.09987, beats_loss=0.009773, ecapa_loss=0.0001662, whisper_loss=0.08844, over 20207.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01047, ecapa_loss=0.0001434, whisper_loss=0.08922, over 3843321.44 frames. ], batch size: 88, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:01:46,013 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2024-08-18 23:01:48,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4147080.0, ans=0.0 2024-08-18 23:01:53,705 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 27 from Vox, 15 fro AS 2024-08-18 23:01:54,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4147180.0, ans=0.125 2024-08-18 23:01:55,527 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-18 23:01:57,883 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.313e+01 2.521e+01 2.832e+01 3.898e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-18 23:02:01,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4147180.0, ans=0.1 2024-08-18 23:02:11,210 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.35 vs. limit=6.0 2024-08-18 23:02:17,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4147280.0, ans=0.0 2024-08-18 23:02:17,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4147280.0, ans=0.125 2024-08-18 23:02:19,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4147280.0, ans=0.125 2024-08-18 23:02:31,564 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 23:02:49,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13650, loss[loss=0.1108, beats_loss=0.01062, ecapa_loss=0.0001423, whisper_loss=0.09877, over 22504.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0105, ecapa_loss=0.0001434, whisper_loss=0.0898, over 3848238.88 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:02:54,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-18 23:03:00,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4147580.0, ans=0.2 2024-08-18 23:03:20,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4147780.0, ans=0.05 2024-08-18 23:03:27,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4147780.0, ans=0.125 2024-08-18 23:03:30,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-08-18 23:03:32,589 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-18 23:03:36,088 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2024-08-18 23:03:56,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13700, loss[loss=0.122, beats_loss=0.00999, ecapa_loss=0.0001407, whisper_loss=0.1106, over 20674.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001433, whisper_loss=0.08915, over 3858359.34 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 2.8823037615171174e+17 2024-08-18 23:04:00,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.04 vs. limit=6.0 2024-08-18 23:04:01,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4148080.0, ans=0.125 2024-08-18 23:04:11,952 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.317e+01 2.575e+01 2.845e+01 4.715e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-18 23:04:14,577 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-18 23:04:20,198 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-18 23:04:22,133 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.594e-03 2024-08-18 23:04:26,133 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4148280.0, ans=0.125 2024-08-18 23:04:28,614 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4148280.0, ans=0.125 2024-08-18 23:04:45,004 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-18 23:05:01,941 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13750, loss[loss=0.1066, beats_loss=0.008013, ecapa_loss=0.0001565, whisper_loss=0.09704, over 14029.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01052, ecapa_loss=0.0001429, whisper_loss=0.08983, over 3857884.78 frames. ], batch size: 54, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:05:21,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=12.0 2024-08-18 23:05:37,339 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-18 23:05:37,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4148780.0, ans=0.1 2024-08-18 23:05:43,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4148880.0, ans=0.0 2024-08-18 23:05:50,357 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-18 23:05:51,418 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06066558510065079, model_norm_threshold=51.49205017089844 2024-08-18 23:05:51,588 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.619e+05, grad_sumsq=1.558e+07, orig_rms_sq=1.039e-02 2024-08-18 23:06:08,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13800, loss[loss=0.1083, beats_loss=0.01135, ecapa_loss=0.0001667, whisper_loss=0.0953, over 22739.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001426, whisper_loss=0.0894, over 3830497.41 frames. ], batch size: 93, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:06:08,402 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-18 23:06:18,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4149080.0, ans=0.1 2024-08-18 23:06:25,233 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.320e+01 2.516e+01 2.780e+01 8.488e+02, threshold=5.033e+01, percent-clipped=1.0 2024-08-18 23:06:33,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4149180.0, ans=0.125 2024-08-18 23:06:34,156 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2024-08-18 23:06:35,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4149280.0, ans=0.125 2024-08-18 23:06:59,403 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4149380.0, ans=0.0 2024-08-18 23:07:14,674 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13850, loss[loss=0.09654, beats_loss=0.01132, ecapa_loss=0.0001373, whisper_loss=0.08385, over 20431.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001428, whisper_loss=0.0897, over 3863379.22 frames. ], batch size: 86, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:07:53,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4149780.0, ans=0.125 2024-08-18 23:07:56,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4149880.0, ans=0.0 2024-08-18 23:08:23,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13900, loss[loss=0.1081, beats_loss=0.009595, ecapa_loss=0.0001255, whisper_loss=0.09723, over 16165.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.000142, whisper_loss=0.09001, over 3881269.92 frames. ], batch size: 60, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:08:28,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4150080.0, ans=0.125 2024-08-18 23:08:35,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4150080.0, ans=0.2 2024-08-18 23:08:40,368 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.243e+01 2.562e+01 2.808e+01 4.472e+01, threshold=5.123e+01, percent-clipped=0.0 2024-08-18 23:08:46,922 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-18 23:08:47,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4150180.0, ans=0.125 2024-08-18 23:08:56,414 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-18 23:09:31,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 13950, loss[loss=0.09933, beats_loss=0.01216, ecapa_loss=0.0001549, whisper_loss=0.08562, over 16675.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001423, whisper_loss=0.09045, over 3895679.72 frames. ], batch size: 68, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:09:48,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4150680.0, ans=0.04949747468305833 2024-08-18 23:09:48,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4150680.0, ans=0.0 2024-08-18 23:10:01,536 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 23:10:05,418 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-18 23:10:05,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4150780.0, ans=0.2 2024-08-18 23:10:30,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4150980.0, ans=0.2 2024-08-18 23:10:35,493 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-18 23:10:38,881 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14000, loss[loss=0.0998, beats_loss=0.01117, ecapa_loss=0.0001341, whisper_loss=0.08729, over 22615.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001415, whisper_loss=0.09065, over 3851483.96 frames. ], batch size: 89, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:10:41,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4151080.0, ans=0.125 2024-08-18 23:10:52,130 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 23:10:53,310 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 31 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-18 23:10:56,369 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.301e+01 2.559e+01 2.803e+01 3.762e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-18 23:10:56,626 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-18 23:10:59,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4151180.0, ans=0.125 2024-08-18 23:11:02,954 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-18 23:11:16,764 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 23:11:22,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4151380.0, ans=0.125 2024-08-18 23:11:28,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4151380.0, ans=0.125 2024-08-18 23:11:30,738 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-18 23:11:46,881 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-08-18 23:11:49,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14050, loss[loss=0.09029, beats_loss=0.009349, ecapa_loss=0.0001819, whisper_loss=0.07912, over 17338.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01047, ecapa_loss=0.000142, whisper_loss=0.09108, over 3879473.52 frames. ], batch size: 70, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:11:51,137 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-18 23:12:03,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4151680.0, ans=0.125 2024-08-18 23:12:07,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4151680.0, ans=0.0 2024-08-18 23:12:08,163 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 23:12:13,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4151680.0, ans=0.1 2024-08-18 23:12:16,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4151780.0, ans=0.125 2024-08-18 23:12:20,072 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-18 23:12:24,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4151780.0, ans=0.0 2024-08-18 23:12:25,560 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-18 23:12:28,569 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 10 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 23:12:59,929 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14100, loss[loss=0.08445, beats_loss=0.01123, ecapa_loss=0.0001495, whisper_loss=0.07172, over 17752.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01049, ecapa_loss=0.0001411, whisper_loss=0.09132, over 3842693.05 frames. ], batch size: 71, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:13:17,466 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.301e+01 2.548e+01 2.823e+01 5.332e+01, threshold=5.096e+01, percent-clipped=1.0 2024-08-18 23:13:28,912 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 23:13:31,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4152280.0, ans=0.125 2024-08-18 23:13:33,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4152280.0, ans=0.125 2024-08-18 23:13:37,509 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-18 23:13:39,063 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-18 23:14:12,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14150, loss[loss=0.1009, beats_loss=0.009559, ecapa_loss=0.0001482, whisper_loss=0.08984, over 16886.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01051, ecapa_loss=0.0001416, whisper_loss=0.091, over 3861366.09 frames. ], batch size: 68, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:14:23,501 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.61 vs. limit=6.0 2024-08-18 23:14:26,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4152680.0, ans=0.0 2024-08-18 23:14:40,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4152780.0, ans=0.1 2024-08-18 23:14:53,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4152780.0, ans=0.1 2024-08-18 23:15:03,373 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.39 vs. limit=15.0 2024-08-18 23:15:06,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=4152880.0, ans=12.0 2024-08-18 23:15:09,104 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 17 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-18 23:15:16,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4152980.0, ans=15.0 2024-08-18 23:15:27,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14200, loss[loss=0.1026, beats_loss=0.01155, ecapa_loss=0.0001307, whisper_loss=0.08978, over 22597.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01056, ecapa_loss=0.000141, whisper_loss=0.09091, over 3870212.23 frames. ], batch size: 90, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:15:45,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.295e+01 2.597e+01 2.935e+01 2.411e+02, threshold=5.195e+01, percent-clipped=3.0 2024-08-18 23:16:01,912 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-18 23:16:06,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4153280.0, ans=0.125 2024-08-18 23:16:16,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4153380.0, ans=0.1 2024-08-18 23:16:36,495 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-18 23:16:40,952 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14250, loss[loss=0.1092, beats_loss=0.01024, ecapa_loss=0.0001335, whisper_loss=0.09762, over 18513.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01051, ecapa_loss=0.0001403, whisper_loss=0.09133, over 3875876.63 frames. ], batch size: 73, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:16:53,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4153580.0, ans=0.125 2024-08-18 23:16:57,938 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.83 vs. limit=12.0 2024-08-18 23:17:26,755 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2024-08-18 23:17:54,772 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14300, loss[loss=0.1016, beats_loss=0.01161, ecapa_loss=0.0001105, whisper_loss=0.08892, over 21472.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0106, ecapa_loss=0.0001402, whisper_loss=0.09076, over 3880199.91 frames. ], batch size: 83, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:17:55,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4154080.0, ans=0.125 2024-08-18 23:17:55,255 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4154080.0, ans=0.0 2024-08-18 23:17:58,847 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-18 23:18:12,372 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.299e+01 2.504e+01 2.790e+01 4.018e+02, threshold=5.008e+01, percent-clipped=2.0 2024-08-18 23:18:34,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-08-18 23:18:47,515 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 18 from Vox, 41 fro AS 2024-08-18 23:18:53,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4154480.0, ans=0.04949747468305833 2024-08-18 23:18:59,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4154480.0, ans=0.125 2024-08-18 23:19:04,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14350, loss[loss=0.1185, beats_loss=0.007271, ecapa_loss=0.0001858, whisper_loss=0.1094, over 17413.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01056, ecapa_loss=0.0001413, whisper_loss=0.09026, over 3898866.19 frames. ], batch size: 72, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:19:19,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4154680.0, ans=0.1 2024-08-18 23:19:31,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4154780.0, ans=0.0 2024-08-18 23:19:39,076 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-18 23:19:57,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4154880.0, ans=0.0 2024-08-18 23:20:02,550 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-18 23:20:17,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14400, loss[loss=0.09414, beats_loss=0.01043, ecapa_loss=0.0001488, whisper_loss=0.08222, over 22880.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01038, ecapa_loss=0.0001432, whisper_loss=0.09127, over 3867512.88 frames. ], batch size: 92, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:20:17,477 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-18 23:20:25,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4155080.0, ans=0.1 2024-08-18 23:20:34,734 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.269e+01 2.603e+01 3.055e+01 4.750e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-18 23:20:37,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4155180.0, ans=0.2 2024-08-18 23:20:43,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4155180.0, ans=0.2 2024-08-18 23:20:43,983 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-08-18 23:21:31,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4155580.0, ans=0.0 2024-08-18 23:21:32,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 28, batch 14450, loss[loss=0.08069, beats_loss=0.01179, ecapa_loss=0.0001528, whisper_loss=0.06737, over 18951.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001435, whisper_loss=0.09095, over 3872054.02 frames. ], batch size: 80, lr: 2.17e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:21:55,336 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-18 23:22:21,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4155880.0, ans=0.2 2024-08-18 23:23:31,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 0, loss[loss=0.09153, beats_loss=0.01107, ecapa_loss=0.0001514, whisper_loss=0.07895, over 22930.00 frames. ], tot_loss[loss=0.09153, beats_loss=0.01107, ecapa_loss=0.0001514, whisper_loss=0.07895, over 22930.00 frames. ], batch size: 93, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:23:31,035 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-18 23:24:08,497 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005265, whisper_loss=0.2475, over 922467.00 frames. 2024-08-18 23:24:23,780 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on SV_voxceleb1: loss=0.004049, beats_loss=0, ecapa_loss=0.0004049, whisper_loss=0, over 939242.00 frames. 2024-08-18 23:26:08,934 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on AT_audioset: loss=0.02325, beats_loss=0.02325, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-18 23:26:08,942 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-18 23:26:31,995 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-18 23:26:38,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4156170.0, ans=0.125 2024-08-18 23:26:41,484 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.401e+01 2.612e+01 3.065e+01 1.665e+02, threshold=5.224e+01, percent-clipped=1.0 2024-08-18 23:26:43,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4156170.0, ans=0.0 2024-08-18 23:28:10,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 50, loss[loss=0.08281, beats_loss=0.009626, ecapa_loss=0.0001318, whisper_loss=0.07187, over 18033.00 frames. ], tot_loss[loss=0.09852, beats_loss=0.009466, ecapa_loss=0.0001468, whisper_loss=0.08758, over 896934.81 frames. ], batch size: 71, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:28:12,915 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-18 23:28:14,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4156570.0, ans=0.125 2024-08-18 23:28:36,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4156670.0, ans=0.1 2024-08-18 23:28:45,315 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 24 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-18 23:28:50,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4156670.0, ans=0.125 2024-08-18 23:29:14,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4156770.0, ans=0.0 2024-08-18 23:29:47,265 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-18 23:29:49,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4156970.0, ans=0.0 2024-08-18 23:29:59,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4156970.0, ans=0.125 2024-08-18 23:30:01,591 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 100, loss[loss=0.1033, beats_loss=0.009404, ecapa_loss=0.0001322, whisper_loss=0.09258, over 23674.00 frames. ], tot_loss[loss=0.09922, beats_loss=0.009274, ecapa_loss=0.0001462, whisper_loss=0.08848, over 1578280.25 frames. ], batch size: 90, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:30:11,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4157070.0, ans=0.125 2024-08-18 23:30:30,205 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.931e+01 2.602e+01 2.871e+01 3.209e+01 4.618e+01, threshold=5.741e+01, percent-clipped=0.0 2024-08-18 23:30:34,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4157170.0, ans=0.0 2024-08-18 23:31:22,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4157370.0, ans=0.125 2024-08-18 23:31:28,092 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 23:31:35,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4157470.0, ans=0.05 2024-08-18 23:31:42,672 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 150, loss[loss=0.1023, beats_loss=0.01059, ecapa_loss=0.0001389, whisper_loss=0.09034, over 22045.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.009331, ecapa_loss=0.0001449, whisper_loss=0.09001, over 2087643.59 frames. ], batch size: 88, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:32:09,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4157670.0, ans=0.125 2024-08-18 23:32:21,866 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 23:32:24,562 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-18 23:32:25,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4157770.0, ans=0.05 2024-08-18 23:33:01,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 200, loss[loss=0.1046, beats_loss=0.01056, ecapa_loss=0.000128, whisper_loss=0.09272, over 19086.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.009559, ecapa_loss=0.0001459, whisper_loss=0.0906, over 2480593.43 frames. ], batch size: 74, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:33:02,515 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05759067460894585, model_norm_threshold=57.41115951538086 2024-08-18 23:33:02,686 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.48, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.794e+05, grad_sumsq=4.619e+07, orig_rms_sq=1.038e-02 2024-08-18 23:33:12,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4158070.0, ans=0.125 2024-08-18 23:33:19,216 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4158170.0, ans=0.0 2024-08-18 23:33:19,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.990e+01 2.401e+01 2.608e+01 2.915e+01 9.969e+02, threshold=5.216e+01, percent-clipped=2.0 2024-08-18 23:33:32,058 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=12.0 2024-08-18 23:33:32,873 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 11 from Vox, 34 fro AS 2024-08-18 23:34:11,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 250, loss[loss=0.1158, beats_loss=0.009407, ecapa_loss=0.00012, whisper_loss=0.1052, over 21331.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009874, ecapa_loss=0.0001438, whisper_loss=0.09018, over 2739186.47 frames. ], batch size: 80, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:34:29,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4158670.0, ans=0.125 2024-08-18 23:34:45,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4158770.0, ans=0.02 2024-08-18 23:35:03,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4158870.0, ans=0.125 2024-08-18 23:35:10,876 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 23:35:18,468 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 300, loss[loss=0.1029, beats_loss=0.01168, ecapa_loss=0.0001359, whisper_loss=0.08987, over 22486.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01004, ecapa_loss=0.000142, whisper_loss=0.09033, over 2970755.47 frames. ], batch size: 89, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:35:22,660 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-18 23:35:30,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.69 vs. limit=15.0 2024-08-18 23:35:34,183 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-18 23:35:35,799 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.277e+01 2.615e+01 3.023e+01 6.948e+01, threshold=5.231e+01, percent-clipped=1.0 2024-08-18 23:36:09,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4159370.0, ans=0.125 2024-08-18 23:36:13,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4159470.0, ans=0.0 2024-08-18 23:36:21,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4159470.0, ans=0.125 2024-08-18 23:36:21,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4159470.0, ans=0.125 2024-08-18 23:36:26,459 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 350, loss[loss=0.09702, beats_loss=0.009503, ecapa_loss=0.0001487, whisper_loss=0.08603, over 18727.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01013, ecapa_loss=0.0001421, whisper_loss=0.0898, over 3169552.53 frames. ], batch size: 74, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:36:32,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4159570.0, ans=0.125 2024-08-18 23:36:33,948 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 23:36:34,412 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4159570.0, ans=0.0 2024-08-18 23:36:41,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4159670.0, ans=0.2 2024-08-18 23:36:57,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4159770.0, ans=0.0 2024-08-18 23:36:57,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4159770.0, ans=0.125 2024-08-18 23:37:12,531 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-18 23:37:17,189 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 11 from Vox, 36 fro AS 2024-08-18 23:37:25,103 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-18 23:37:31,820 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-18 23:37:37,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4160070.0, ans=0.125 2024-08-18 23:37:38,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 400, loss[loss=0.1025, beats_loss=0.01019, ecapa_loss=0.0001208, whisper_loss=0.09113, over 16059.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01026, ecapa_loss=0.0001412, whisper_loss=0.08963, over 3338922.76 frames. ], batch size: 62, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:37:40,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=12.0 2024-08-18 23:37:52,763 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 23:37:55,255 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.335e+01 2.628e+01 2.920e+01 4.432e+01, threshold=5.257e+01, percent-clipped=0.0 2024-08-18 23:38:03,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4160270.0, ans=0.125 2024-08-18 23:38:05,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=22.5 2024-08-18 23:38:10,479 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 23:38:14,417 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-18 23:38:18,557 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-18 23:38:26,061 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-18 23:38:26,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4160370.0, ans=0.125 2024-08-18 23:38:33,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4160470.0, ans=0.125 2024-08-18 23:38:34,533 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:38:34,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4160470.0, ans=0.1 2024-08-18 23:38:35,557 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-18 23:38:46,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 450, loss[loss=0.09335, beats_loss=0.01114, ecapa_loss=0.0001309, whisper_loss=0.0809, over 18528.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01026, ecapa_loss=0.0001427, whisper_loss=0.08902, over 3440380.95 frames. ], batch size: 72, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:38:57,979 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-18 23:39:04,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4160670.0, ans=0.2 2024-08-18 23:39:11,206 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-18 23:39:16,099 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2024-08-18 23:39:55,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 500, loss[loss=0.1019, beats_loss=0.01074, ecapa_loss=0.000143, whisper_loss=0.08977, over 18690.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01027, ecapa_loss=0.0001443, whisper_loss=0.0886, over 3522268.49 frames. ], batch size: 74, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:39:59,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.36 vs. limit=15.0 2024-08-18 23:40:04,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-08-18 23:40:06,670 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-18 23:40:13,472 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.857e+01 2.408e+01 2.730e+01 3.089e+01 1.165e+02, threshold=5.459e+01, percent-clipped=2.0 2024-08-18 23:40:13,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=4161170.0, ans=0.5 2024-08-18 23:40:34,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4161270.0, ans=0.0 2024-08-18 23:40:45,792 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:41:05,126 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 550, loss[loss=0.1005, beats_loss=0.01034, ecapa_loss=0.000143, whisper_loss=0.08876, over 22167.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01018, ecapa_loss=0.000143, whisper_loss=0.08925, over 3612027.19 frames. ], batch size: 89, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:41:18,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.54 vs. limit=6.0 2024-08-18 23:41:28,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4161670.0, ans=0.125 2024-08-18 23:41:35,566 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-18 23:41:40,513 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-18 23:41:44,701 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 19 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-18 23:42:12,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 600, loss[loss=0.107, beats_loss=0.009713, ecapa_loss=0.0001283, whisper_loss=0.09602, over 22829.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01029, ecapa_loss=0.0001415, whisper_loss=0.08939, over 3665506.82 frames. ], batch size: 92, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:42:22,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4162070.0, ans=0.125 2024-08-18 23:42:22,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4162070.0, ans=0.125 2024-08-18 23:42:30,323 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.259e+01 2.406e+01 2.695e+01 1.044e+02, threshold=4.812e+01, percent-clipped=1.0 2024-08-18 23:42:38,863 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-18 23:43:08,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4162470.0, ans=0.035 2024-08-18 23:43:14,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4162470.0, ans=0.2 2024-08-18 23:43:14,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4162470.0, ans=0.125 2024-08-18 23:43:19,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 650, loss[loss=0.1004, beats_loss=0.01136, ecapa_loss=0.0001336, whisper_loss=0.0877, over 22522.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.0103, ecapa_loss=0.000142, whisper_loss=0.08892, over 3703261.10 frames. ], batch size: 91, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:43:26,717 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-18 23:43:32,212 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-18 23:43:34,780 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-18 23:44:04,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4162870.0, ans=0.0 2024-08-18 23:44:07,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4162870.0, ans=0.2 2024-08-18 23:44:10,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4162870.0, ans=0.1 2024-08-18 23:44:12,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4162970.0, ans=0.0 2024-08-18 23:44:27,116 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 700, loss[loss=0.1091, beats_loss=0.01076, ecapa_loss=0.0001491, whisper_loss=0.09688, over 19555.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01034, ecapa_loss=0.0001422, whisper_loss=0.08906, over 3748249.49 frames. ], batch size: 77, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:44:38,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4163070.0, ans=0.5 2024-08-18 23:44:44,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.232e+01 2.495e+01 2.687e+01 4.242e+01, threshold=4.990e+01, percent-clipped=0.0 2024-08-18 23:45:04,481 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-18 23:45:19,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4163370.0, ans=0.0 2024-08-18 23:45:29,317 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 33 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-18 23:45:34,245 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 750, loss[loss=0.117, beats_loss=0.008642, ecapa_loss=0.0001437, whisper_loss=0.1069, over 14264.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0104, ecapa_loss=0.0001415, whisper_loss=0.08903, over 3767381.71 frames. ], batch size: 56, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:45:36,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4163570.0, ans=0.2 2024-08-18 23:45:37,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4163570.0, ans=0.0 2024-08-18 23:45:44,351 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-18 23:45:58,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4163670.0, ans=0.125 2024-08-18 23:46:00,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4163770.0, ans=0.0 2024-08-18 23:46:07,797 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 23:46:36,658 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-18 23:46:42,150 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 800, loss[loss=0.06617, beats_loss=0.01618, ecapa_loss=0.0001039, whisper_loss=0.04894, over 17312.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0104, ecapa_loss=0.000141, whisper_loss=0.08935, over 3776641.32 frames. ], batch size: 69, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:46:42,327 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-18 23:46:56,732 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-18 23:46:59,126 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.767e+01 2.180e+01 2.421e+01 2.736e+01 5.483e+01, threshold=4.843e+01, percent-clipped=1.0 2024-08-18 23:47:03,611 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 16 from Vox, 14 fro AS 2024-08-18 23:47:05,534 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4164170.0, ans=0.2 2024-08-18 23:47:12,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.37 vs. limit=22.5 2024-08-18 23:47:15,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=12.0 2024-08-18 23:47:19,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4164270.0, ans=0.2 2024-08-18 23:47:30,422 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-18 23:47:37,764 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-18 23:47:43,295 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-18 23:47:49,600 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 850, loss[loss=0.09429, beats_loss=0.007092, ecapa_loss=0.000151, whisper_loss=0.08569, over 14983.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01035, ecapa_loss=0.0001418, whisper_loss=0.0888, over 3789175.91 frames. ], batch size: 60, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:47:50,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4164570.0, ans=0.0 2024-08-18 23:47:54,216 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-18 23:48:09,326 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 13 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-18 23:48:13,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4164670.0, ans=0.125 2024-08-18 23:48:15,518 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.65 vs. limit=22.5 2024-08-18 23:48:29,476 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-18 23:48:32,420 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-18 23:48:37,910 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-18 23:48:48,132 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 22 from Vox, 16 fro AS 2024-08-18 23:48:57,716 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 900, loss[loss=0.09225, beats_loss=0.01228, ecapa_loss=0.0001237, whisper_loss=0.07874, over 21651.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.08832, over 3825505.30 frames. ], batch size: 88, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:49:11,942 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-08-18 23:49:13,746 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 22 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-18 23:49:14,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.209e+01 2.441e+01 2.717e+01 4.171e+01, threshold=4.882e+01, percent-clipped=0.0 2024-08-18 23:49:17,728 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-18 23:49:26,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4165270.0, ans=0.2 2024-08-18 23:49:27,345 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-18 23:49:32,477 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-18 23:49:38,438 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 29 from LS+wenet, 17 from Vox, 14 fro AS 2024-08-18 23:49:53,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.81 vs. limit=15.0 2024-08-18 23:49:57,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4165470.0, ans=0.025 2024-08-18 23:49:57,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4165470.0, ans=0.0 2024-08-18 23:49:58,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4165470.0, ans=0.125 2024-08-18 23:50:05,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 950, loss[loss=0.1002, beats_loss=0.01297, ecapa_loss=0.0001291, whisper_loss=0.08589, over 19225.00 frames. ], tot_loss[loss=0.09975, beats_loss=0.01043, ecapa_loss=0.0001412, whisper_loss=0.0879, over 3807638.86 frames. ], batch size: 76, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:50:05,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4165570.0, ans=0.0 2024-08-18 23:50:09,460 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 14 from Vox, 41 fro AS 2024-08-18 23:50:23,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4165670.0, ans=0.125 2024-08-18 23:50:24,728 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-18 23:50:50,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.69 vs. limit=5.0 2024-08-18 23:51:04,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4165970.0, ans=0.0 2024-08-18 23:51:13,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1000, loss[loss=0.09564, beats_loss=0.01154, ecapa_loss=0.0001233, whisper_loss=0.08287, over 16099.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01038, ecapa_loss=0.0001414, whisper_loss=0.08864, over 3806940.08 frames. ], batch size: 64, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:51:19,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4166070.0, ans=0.0 2024-08-18 23:51:24,703 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-18 23:51:28,762 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-18 23:51:31,120 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.375e+01 2.563e+01 2.832e+01 6.372e+01, threshold=5.125e+01, percent-clipped=2.0 2024-08-18 23:51:37,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4166170.0, ans=0.04949747468305833 2024-08-18 23:51:47,780 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-18 23:51:54,676 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-18 23:52:13,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4166470.0, ans=0.2 2024-08-18 23:52:21,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1050, loss[loss=0.106, beats_loss=0.01129, ecapa_loss=0.0001317, whisper_loss=0.09338, over 19898.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01034, ecapa_loss=0.0001415, whisper_loss=0.08917, over 3853912.84 frames. ], batch size: 78, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:52:24,939 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-18 23:52:25,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.66 vs. limit=15.0 2024-08-18 23:52:26,079 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-18 23:52:34,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4166670.0, ans=0.1 2024-08-18 23:52:40,263 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.51 vs. limit=10.0 2024-08-18 23:52:41,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4166670.0, ans=0.2 2024-08-18 23:52:42,038 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-18 23:53:04,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4166870.0, ans=0.0 2024-08-18 23:53:13,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4166870.0, ans=0.0 2024-08-18 23:53:15,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4166970.0, ans=0.1 2024-08-18 23:53:24,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4166970.0, ans=0.125 2024-08-18 23:53:24,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4166970.0, ans=0.0 2024-08-18 23:53:29,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1100, loss[loss=0.0712, beats_loss=0.01012, ecapa_loss=0.0001334, whisper_loss=0.05974, over 14234.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0103, ecapa_loss=0.0001405, whisper_loss=0.09026, over 3858915.29 frames. ], batch size: 53, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:53:32,869 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-18 23:53:32,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4167070.0, ans=0.0 2024-08-18 23:53:33,813 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-18 23:53:41,637 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-18 23:53:46,319 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.348e+01 2.551e+01 2.998e+01 5.491e+01, threshold=5.102e+01, percent-clipped=1.0 2024-08-18 23:54:00,344 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-18 23:54:15,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4167370.0, ans=0.125 2024-08-18 23:54:31,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4167470.0, ans=0.125 2024-08-18 23:54:36,200 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1150, loss[loss=0.1004, beats_loss=0.01143, ecapa_loss=0.0001236, whisper_loss=0.08769, over 17416.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01035, ecapa_loss=0.0001416, whisper_loss=0.08927, over 3853695.81 frames. ], batch size: 67, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:54:38,658 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2024-08-18 23:54:51,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4167670.0, ans=0.0 2024-08-18 23:54:56,666 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.053e+00 2024-08-18 23:55:19,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4167870.0, ans=0.125 2024-08-18 23:55:27,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4167870.0, ans=0.2 2024-08-18 23:55:37,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4167970.0, ans=0.125 2024-08-18 23:55:43,874 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1200, loss[loss=0.1067, beats_loss=0.008497, ecapa_loss=0.0001765, whisper_loss=0.09645, over 21622.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.08905, over 3834085.71 frames. ], batch size: 86, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:56:01,575 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.277e+01 2.488e+01 2.834e+01 2.594e+02, threshold=4.975e+01, percent-clipped=3.0 2024-08-18 23:56:06,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4168170.0, ans=0.2 2024-08-18 23:56:07,056 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-18 23:56:11,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4168270.0, ans=0.0 2024-08-18 23:56:15,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4168270.0, ans=0.125 2024-08-18 23:56:17,495 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-18 23:56:28,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4168370.0, ans=0.125 2024-08-18 23:56:39,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2024-08-18 23:56:51,137 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1250, loss[loss=0.1144, beats_loss=0.009564, ecapa_loss=0.0001371, whisper_loss=0.1034, over 20444.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.08907, over 3815116.42 frames. ], batch size: 75, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:56:59,414 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-18 23:57:04,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4168670.0, ans=0.04949747468305833 2024-08-18 23:57:07,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4168670.0, ans=0.125 2024-08-18 23:57:11,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4168670.0, ans=0.0 2024-08-18 23:57:12,023 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-18 23:57:34,459 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-18 23:57:38,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4168870.0, ans=0.0 2024-08-18 23:57:40,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4168870.0, ans=0.1 2024-08-18 23:57:46,440 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-18 23:57:49,408 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 23 from Vox, 46 fro AS 2024-08-18 23:57:56,430 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-08-18 23:57:58,111 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1300, loss[loss=0.1096, beats_loss=0.01035, ecapa_loss=0.0001536, whisper_loss=0.0977, over 16991.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.08944, over 3811087.35 frames. ], batch size: 67, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:58:07,207 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-08-18 23:58:17,207 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.654e+01 2.196e+01 2.537e+01 2.815e+01 5.238e+01, threshold=5.075e+01, percent-clipped=1.0 2024-08-18 23:58:21,010 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2024-08-18 23:58:26,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4169270.0, ans=0.025 2024-08-18 23:58:34,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4169270.0, ans=0.125 2024-08-18 23:58:41,016 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-18 23:58:42,818 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-18 23:58:52,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4169470.0, ans=0.0 2024-08-18 23:58:58,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4169470.0, ans=0.025 2024-08-18 23:59:06,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1350, loss[loss=0.1009, beats_loss=0.01052, ecapa_loss=0.0001234, whisper_loss=0.08913, over 17666.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01038, ecapa_loss=0.0001406, whisper_loss=0.09012, over 3820119.33 frames. ], batch size: 69, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-18 23:59:16,927 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-08-18 23:59:35,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4169770.0, ans=0.0 2024-08-18 23:59:37,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4169770.0, ans=0.125 2024-08-18 23:59:40,507 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 00:00:00,674 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 00:00:02,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.90 vs. limit=15.0 2024-08-19 00:00:10,439 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 00:00:14,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1400, loss[loss=0.09737, beats_loss=0.01221, ecapa_loss=0.0001477, whisper_loss=0.08368, over 18656.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001405, whisper_loss=0.08967, over 3830313.52 frames. ], batch size: 79, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:00:23,562 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4170070.0, ans=0.125 2024-08-19 00:00:28,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4170170.0, ans=0.125 2024-08-19 00:00:33,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.158e+01 2.419e+01 2.652e+01 3.776e+01, threshold=4.839e+01, percent-clipped=0.0 2024-08-19 00:00:36,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2024-08-19 00:00:47,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4170270.0, ans=0.2 2024-08-19 00:01:05,059 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 21 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 00:01:22,104 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1450, loss[loss=0.1093, beats_loss=0.006815, ecapa_loss=0.0001513, whisper_loss=0.101, over 16272.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01045, ecapa_loss=0.0001403, whisper_loss=0.08864, over 3809487.36 frames. ], batch size: 63, lr: 2.13e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:01:53,379 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 00:01:59,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4170570.0, ans=0.125 2024-08-19 00:02:29,060 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 00:02:39,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4170870.0, ans=0.0 2024-08-19 00:02:49,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4170970.0, ans=0.125 2024-08-19 00:02:54,665 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 13 from Vox, 40 fro AS 2024-08-19 00:03:05,593 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1500, loss[loss=0.07869, beats_loss=0.01135, ecapa_loss=0.0001521, whisper_loss=0.06582, over 14334.00 frames. ], tot_loss[loss=0.1, beats_loss=0.01045, ecapa_loss=0.0001397, whisper_loss=0.08817, over 3751828.72 frames. ], batch size: 60, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:03:06,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4171070.0, ans=0.125 2024-08-19 00:03:09,505 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4171070.0, ans=0.0 2024-08-19 00:03:13,617 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 00:03:21,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4171170.0, ans=0.1 2024-08-19 00:03:27,146 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.239e+01 2.511e+01 2.818e+01 6.129e+01, threshold=5.023e+01, percent-clipped=1.0 2024-08-19 00:03:32,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4171170.0, ans=0.2 2024-08-19 00:03:38,425 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4171270.0, ans=0.125 2024-08-19 00:03:38,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4171270.0, ans=0.1 2024-08-19 00:03:42,269 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 00:03:42,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=12.0 2024-08-19 00:03:52,845 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 38 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 00:04:18,656 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:04:21,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1550, loss[loss=0.1055, beats_loss=0.008121, ecapa_loss=0.0001339, whisper_loss=0.096, over 20185.00 frames. ], tot_loss[loss=0.1004, beats_loss=0.01043, ecapa_loss=0.0001393, whisper_loss=0.08862, over 3789927.65 frames. ], batch size: 76, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:04:35,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4171670.0, ans=0.125 2024-08-19 00:04:35,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4171670.0, ans=0.0 2024-08-19 00:04:38,030 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 29 from Vox, 38 fro AS 2024-08-19 00:04:47,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4171670.0, ans=0.125 2024-08-19 00:04:49,858 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:04:56,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4171770.0, ans=0.1 2024-08-19 00:05:04,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4171870.0, ans=0.125 2024-08-19 00:05:07,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4171870.0, ans=0.125 2024-08-19 00:05:07,983 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 00:05:26,392 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-19 00:05:28,234 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 00:05:29,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4171970.0, ans=0.125 2024-08-19 00:05:34,416 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1600, loss[loss=0.1148, beats_loss=0.008779, ecapa_loss=0.0001786, whisper_loss=0.1043, over 21892.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01044, ecapa_loss=0.0001384, whisper_loss=0.08891, over 3811984.67 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:05:56,212 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.800e+01 2.337e+01 2.619e+01 2.865e+01 4.288e+01, threshold=5.238e+01, percent-clipped=0.0 2024-08-19 00:06:02,353 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2024-08-19 00:06:03,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4172270.0, ans=0.05 2024-08-19 00:06:23,861 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 00:06:35,258 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-19 00:06:42,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.85 vs. limit=10.0 2024-08-19 00:06:43,429 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 00:06:43,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4172470.0, ans=0.2 2024-08-19 00:06:47,194 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1650, loss[loss=0.1108, beats_loss=0.01053, ecapa_loss=0.0001426, whisper_loss=0.09888, over 22375.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01036, ecapa_loss=0.0001386, whisper_loss=0.0893, over 3797236.84 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:07:00,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4172570.0, ans=0.95 2024-08-19 00:07:02,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4172570.0, ans=0.125 2024-08-19 00:07:09,322 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4172670.0, ans=0.0 2024-08-19 00:07:13,598 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 00:07:16,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4172670.0, ans=0.0 2024-08-19 00:07:18,737 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 14 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 00:07:22,146 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 00:07:32,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4172870.0, ans=0.125 2024-08-19 00:07:32,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-08-19 00:07:34,299 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2024-08-19 00:07:38,334 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2024-08-19 00:07:41,888 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 00:07:47,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4172970.0, ans=0.125 2024-08-19 00:07:48,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.51 vs. limit=15.0 2024-08-19 00:07:52,149 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 00:07:54,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4172970.0, ans=0.125 2024-08-19 00:07:59,463 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1700, loss[loss=0.09352, beats_loss=0.01253, ecapa_loss=0.0001186, whisper_loss=0.0798, over 17499.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01031, ecapa_loss=0.000139, whisper_loss=0.08975, over 3806778.44 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:08:11,016 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 00:08:22,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.287e+01 2.515e+01 2.871e+01 4.134e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-19 00:08:38,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4173270.0, ans=0.09899494936611666 2024-08-19 00:08:42,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.49 vs. limit=10.0 2024-08-19 00:08:43,717 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 00:09:20,199 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 00:09:27,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1750, loss[loss=0.1142, beats_loss=0.009929, ecapa_loss=0.0001205, whisper_loss=0.103, over 24279.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01032, ecapa_loss=0.000139, whisper_loss=0.08969, over 3853734.59 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:09:43,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4173670.0, ans=0.125 2024-08-19 00:09:48,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4173670.0, ans=0.09899494936611666 2024-08-19 00:10:01,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-19 00:10:05,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4173770.0, ans=0.125 2024-08-19 00:10:08,036 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 00:10:09,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4173770.0, ans=0.125 2024-08-19 00:10:41,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4173970.0, ans=0.07 2024-08-19 00:10:52,447 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1800, loss[loss=0.08987, beats_loss=0.01474, ecapa_loss=8.261e-05, whisper_loss=0.0743, over 16288.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001381, whisper_loss=0.0896, over 3838789.18 frames. ], batch size: 62, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:10:52,607 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 00:10:56,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.50 vs. limit=15.0 2024-08-19 00:10:56,506 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 00:11:19,703 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.235e+01 2.459e+01 2.784e+01 3.423e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-19 00:11:25,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4174170.0, ans=0.0 2024-08-19 00:12:17,521 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 31 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 00:12:27,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4174470.0, ans=0.125 2024-08-19 00:12:27,150 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:12:28,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4174470.0, ans=0.125 2024-08-19 00:12:29,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4174470.0, ans=15.0 2024-08-19 00:12:35,641 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1850, loss[loss=0.09289, beats_loss=0.008149, ecapa_loss=0.0001712, whisper_loss=0.08303, over 17946.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01033, ecapa_loss=0.0001381, whisper_loss=0.08994, over 3842158.78 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:13:20,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-19 00:13:24,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-08-19 00:13:52,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4174870.0, ans=0.125 2024-08-19 00:14:09,742 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-19 00:14:19,594 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1900, loss[loss=0.09869, beats_loss=0.01004, ecapa_loss=0.0001284, whisper_loss=0.08737, over 21306.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0102, ecapa_loss=0.0001388, whisper_loss=0.09014, over 3836111.27 frames. ], batch size: 83, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:14:24,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4175070.0, ans=0.125 2024-08-19 00:14:24,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4175070.0, ans=0.04949747468305833 2024-08-19 00:14:48,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.320e+01 2.567e+01 2.860e+01 4.992e+01, threshold=5.134e+01, percent-clipped=1.0 2024-08-19 00:14:59,645 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 32 from Vox, 38 fro AS 2024-08-19 00:15:05,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4175270.0, ans=0.025 2024-08-19 00:15:48,092 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 00:15:56,728 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=4175470.0, ans=0.025 2024-08-19 00:15:58,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 1950, loss[loss=0.1145, beats_loss=0.008322, ecapa_loss=0.0001523, whisper_loss=0.1047, over 18430.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01025, ecapa_loss=0.0001396, whisper_loss=0.08957, over 3799368.07 frames. ], batch size: 70, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:16:10,318 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 00:16:11,581 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 00:16:36,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4175770.0, ans=0.0 2024-08-19 00:16:37,645 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 00:16:38,947 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 00:16:44,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4175870.0, ans=0.125 2024-08-19 00:16:57,456 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 00:17:10,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2000, loss[loss=0.1066, beats_loss=0.008436, ecapa_loss=0.0001403, whisper_loss=0.09679, over 19090.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.08929, over 3790685.06 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:17:12,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4176070.0, ans=0.0 2024-08-19 00:17:30,901 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.262e+01 2.428e+01 2.730e+01 5.039e+01, threshold=4.856e+01, percent-clipped=0.0 2024-08-19 00:17:31,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4176170.0, ans=0.1 2024-08-19 00:17:40,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4176270.0, ans=0.2 2024-08-19 00:17:42,458 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 34 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-19 00:17:58,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4176370.0, ans=0.125 2024-08-19 00:18:18,639 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4176470.0, ans=0.0 2024-08-19 00:18:22,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2050, loss[loss=0.0862, beats_loss=0.01055, ecapa_loss=0.0001251, whisper_loss=0.0744, over 18151.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.000139, whisper_loss=0.08964, over 3823589.25 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:18:25,846 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=12.0 2024-08-19 00:18:36,588 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 18 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 00:18:38,330 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 00:18:45,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4176670.0, ans=0.05 2024-08-19 00:18:53,440 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 00:18:59,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4176770.0, ans=0.95 2024-08-19 00:19:07,950 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:19:09,095 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2024-08-19 00:19:11,314 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 10 from Vox, 38 fro AS 2024-08-19 00:19:21,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4176970.0, ans=0.0 2024-08-19 00:19:23,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4176970.0, ans=0.0 2024-08-19 00:19:33,330 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2100, loss[loss=0.1263, beats_loss=0.0103, ecapa_loss=0.000123, whisper_loss=0.1148, over 23194.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01036, ecapa_loss=0.0001383, whisper_loss=0.08945, over 3797371.21 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:19:41,775 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 00:19:54,052 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.287e+01 2.483e+01 2.876e+01 4.561e+01, threshold=4.966e+01, percent-clipped=0.0 2024-08-19 00:19:57,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4177170.0, ans=0.0 2024-08-19 00:19:58,350 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 00:20:23,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4177370.0, ans=0.125 2024-08-19 00:20:28,073 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-19 00:20:41,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4177470.0, ans=0.125 2024-08-19 00:20:42,757 WARNING [optim.py:496] (3/4) Scaling gradients by 0.019715236499905586, model_norm_threshold=49.664920806884766 2024-08-19 00:20:42,929 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.827e+05, grad_sumsq=8.827e+05, orig_rms_sq=1.000e+00 2024-08-19 00:20:44,498 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2150, loss[loss=0.1104, beats_loss=0.009532, ecapa_loss=0.0001572, whisper_loss=0.09932, over 18575.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01042, ecapa_loss=0.0001385, whisper_loss=0.08928, over 3762147.94 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:20:46,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4177570.0, ans=0.125 2024-08-19 00:21:07,385 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 19 from LS+wenet, 18 from Vox, 49 fro AS 2024-08-19 00:21:31,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4177870.0, ans=0.1 2024-08-19 00:21:31,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4177870.0, ans=0.04949747468305833 2024-08-19 00:21:31,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2024-08-19 00:21:42,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4177970.0, ans=0.125 2024-08-19 00:21:54,891 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2200, loss[loss=0.1079, beats_loss=0.009233, ecapa_loss=0.000117, whisper_loss=0.09754, over 23969.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001388, whisper_loss=0.08989, over 3808003.74 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:21:54,992 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 00:22:14,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.401e+01 2.663e+01 2.914e+01 2.519e+03, threshold=5.327e+01, percent-clipped=2.0 2024-08-19 00:22:15,438 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.972e+01 2024-08-19 00:22:19,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4178170.0, ans=0.0 2024-08-19 00:22:44,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4178370.0, ans=0.0 2024-08-19 00:22:56,289 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 10 from Vox, 23 fro AS 2024-08-19 00:22:59,162 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 00:23:01,816 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 00:23:06,175 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2250, loss[loss=0.1155, beats_loss=0.01003, ecapa_loss=0.0001221, whisper_loss=0.1043, over 22991.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001387, whisper_loss=0.09007, over 3790446.87 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:23:07,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4178570.0, ans=0.125 2024-08-19 00:23:09,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4178570.0, ans=0.0 2024-08-19 00:23:18,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4178570.0, ans=0.0 2024-08-19 00:23:27,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.17 vs. limit=6.0 2024-08-19 00:23:29,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4178670.0, ans=0.2 2024-08-19 00:23:30,409 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 10 from Vox, 28 fro AS 2024-08-19 00:23:47,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.12 vs. limit=15.0 2024-08-19 00:23:49,689 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 00:24:18,227 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2300, loss[loss=0.106, beats_loss=0.009706, ecapa_loss=0.0001282, whisper_loss=0.09498, over 23592.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001391, whisper_loss=0.09061, over 3788077.63 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:24:26,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4179070.0, ans=0.2 2024-08-19 00:24:26,917 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 00:24:27,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4179070.0, ans=0.125 2024-08-19 00:24:28,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4179070.0, ans=0.07 2024-08-19 00:24:38,206 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.287e+01 2.550e+01 2.826e+01 7.255e+01, threshold=5.099e+01, percent-clipped=2.0 2024-08-19 00:24:51,583 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.054e-01 2024-08-19 00:24:53,664 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 00:25:06,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4179370.0, ans=0.125 2024-08-19 00:25:09,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4179370.0, ans=0.1 2024-08-19 00:25:18,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4179470.0, ans=0.125 2024-08-19 00:25:28,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2350, loss[loss=0.1072, beats_loss=0.009989, ecapa_loss=0.0001509, whisper_loss=0.0957, over 22444.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01039, ecapa_loss=0.0001409, whisper_loss=0.09155, over 3839786.18 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:25:31,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4179570.0, ans=0.0 2024-08-19 00:26:17,775 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 00:26:21,085 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-08-19 00:26:26,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4179970.0, ans=0.125 2024-08-19 00:26:27,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4179970.0, ans=0.2 2024-08-19 00:26:39,340 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2400, loss[loss=0.08988, beats_loss=0.009152, ecapa_loss=0.0001812, whisper_loss=0.07892, over 15800.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.000141, whisper_loss=0.09104, over 3853182.96 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:26:46,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4180070.0, ans=0.1 2024-08-19 00:26:59,949 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.410e+01 2.624e+01 2.909e+01 9.467e+01, threshold=5.248e+01, percent-clipped=3.0 2024-08-19 00:27:00,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4180170.0, ans=0.125 2024-08-19 00:27:03,343 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.35 vs. limit=15.0 2024-08-19 00:27:35,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4180470.0, ans=0.09899494936611666 2024-08-19 00:27:39,110 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 00:27:46,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4180470.0, ans=0.125 2024-08-19 00:27:49,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4180570.0, ans=0.125 2024-08-19 00:27:49,917 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2450, loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.0001343, whisper_loss=0.09031, over 23145.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01054, ecapa_loss=0.0001403, whisper_loss=0.09103, over 3887797.01 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:27:55,000 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.098e+05 2024-08-19 00:28:06,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4180670.0, ans=0.125 2024-08-19 00:28:06,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4180670.0, ans=0.125 2024-08-19 00:28:25,786 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 00:28:45,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2024-08-19 00:28:54,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4180970.0, ans=0.125 2024-08-19 00:28:59,541 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2500, loss[loss=0.1197, beats_loss=0.00748, ecapa_loss=0.0001603, whisper_loss=0.1106, over 14650.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01046, ecapa_loss=0.0001405, whisper_loss=0.09126, over 3897080.96 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:28:59,705 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 00:29:09,066 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 00:29:18,098 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.325e+01 2.522e+01 2.873e+01 3.781e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 00:29:24,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4181170.0, ans=0.125 2024-08-19 00:29:33,878 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 00:29:42,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.53 vs. limit=15.0 2024-08-19 00:30:07,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2550, loss[loss=0.0645, beats_loss=0.01393, ecapa_loss=0.0001329, whisper_loss=0.04924, over 15871.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01035, ecapa_loss=0.0001404, whisper_loss=0.09168, over 3883220.39 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:30:07,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4181570.0, ans=0.125 2024-08-19 00:30:10,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-08-19 00:30:13,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.96 vs. limit=22.5 2024-08-19 00:30:17,388 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4181570.0, ans=0.1 2024-08-19 00:30:19,565 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 00:30:27,265 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.57 vs. limit=15.0 2024-08-19 00:30:31,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4181670.0, ans=0.0 2024-08-19 00:30:46,579 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 25 from LS+wenet, 24 from Vox, 47 fro AS 2024-08-19 00:30:53,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4181870.0, ans=0.125 2024-08-19 00:30:54,976 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:31:14,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2600, loss[loss=0.1204, beats_loss=0.00745, ecapa_loss=0.0001726, whisper_loss=0.1112, over 19861.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01033, ecapa_loss=0.0001403, whisper_loss=0.0913, over 3882697.64 frames. ], batch size: 81, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:31:32,364 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.363e+01 2.681e+01 3.007e+01 2.480e+02, threshold=5.362e+01, percent-clipped=3.0 2024-08-19 00:31:34,065 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 00:31:45,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4182270.0, ans=0.125 2024-08-19 00:31:46,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.69 vs. limit=15.0 2024-08-19 00:31:47,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4182270.0, ans=0.125 2024-08-19 00:31:48,861 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 00:31:51,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.94 vs. limit=10.0 2024-08-19 00:32:04,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4182470.0, ans=0.1 2024-08-19 00:32:08,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4182470.0, ans=0.0 2024-08-19 00:32:18,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2650, loss[loss=0.1018, beats_loss=0.01155, ecapa_loss=0.0001356, whisper_loss=0.08893, over 19216.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01039, ecapa_loss=0.0001408, whisper_loss=0.0909, over 3885684.33 frames. ], batch size: 77, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:32:25,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4182570.0, ans=0.05 2024-08-19 00:32:41,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4182670.0, ans=0.2 2024-08-19 00:33:06,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4182870.0, ans=0.04949747468305833 2024-08-19 00:33:21,988 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2700, loss[loss=0.111, beats_loss=0.009435, ecapa_loss=0.000136, whisper_loss=0.1002, over 22784.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001398, whisper_loss=0.08988, over 3915041.60 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:33:37,000 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.06 vs. limit=12.0 2024-08-19 00:33:39,787 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.289e+01 2.477e+01 2.694e+01 4.905e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-19 00:33:48,901 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:33:49,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2024-08-19 00:33:52,793 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4183270.0, ans=0.125 2024-08-19 00:34:00,630 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 00:34:00,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4183370.0, ans=0.125 2024-08-19 00:34:06,979 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4183370.0, ans=0.2 2024-08-19 00:34:13,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4183470.0, ans=0.125 2024-08-19 00:34:24,778 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 00:34:25,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2750, loss[loss=0.1033, beats_loss=0.01077, ecapa_loss=0.000127, whisper_loss=0.09126, over 19938.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001411, whisper_loss=0.09015, over 3906703.72 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:34:46,736 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-08-19 00:35:06,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4183870.0, ans=0.125 2024-08-19 00:35:09,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4183870.0, ans=0.0 2024-08-19 00:35:14,191 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:35:16,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.65 vs. limit=22.5 2024-08-19 00:35:22,118 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 00:35:29,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2800, loss[loss=0.1199, beats_loss=0.009183, ecapa_loss=0.000122, whisper_loss=0.1095, over 23519.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001407, whisper_loss=0.09045, over 3905034.50 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:35:43,690 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 00:35:47,524 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.309e+01 2.538e+01 2.850e+01 3.934e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-19 00:35:51,718 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 10 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 00:36:07,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4184370.0, ans=0.0 2024-08-19 00:36:14,844 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 00:36:27,829 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 00:36:33,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2850, loss[loss=0.1154, beats_loss=0.008505, ecapa_loss=0.0001395, whisper_loss=0.1055, over 21562.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.09037, over 3876991.73 frames. ], batch size: 84, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 00:36:40,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.85 vs. limit=22.5 2024-08-19 00:36:59,993 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4184770.0, ans=0.125 2024-08-19 00:37:09,044 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 00:37:12,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4184870.0, ans=0.125 2024-08-19 00:37:12,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4184870.0, ans=0.2 2024-08-19 00:37:14,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4184870.0, ans=0.125 2024-08-19 00:37:24,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4184970.0, ans=0.0 2024-08-19 00:37:33,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4184970.0, ans=0.125 2024-08-19 00:37:37,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2900, loss[loss=0.1041, beats_loss=0.009863, ecapa_loss=0.0001986, whisper_loss=0.09229, over 22339.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001415, whisper_loss=0.0902, over 3908658.97 frames. ], batch size: 94, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:37:56,752 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.844e+01 2.390e+01 2.645e+01 3.019e+01 5.767e+01, threshold=5.291e+01, percent-clipped=1.0 2024-08-19 00:37:58,172 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 00:38:05,300 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 00:38:17,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4185370.0, ans=0.125 2024-08-19 00:38:17,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4185370.0, ans=0.1 2024-08-19 00:38:34,891 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 00:38:41,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 2950, loss[loss=0.09752, beats_loss=0.0109, ecapa_loss=0.0001626, whisper_loss=0.08499, over 16545.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01044, ecapa_loss=0.0001422, whisper_loss=0.09067, over 3887540.14 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:39:08,357 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4185770.0, ans=0.125 2024-08-19 00:39:18,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4185870.0, ans=0.125 2024-08-19 00:39:18,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=4185870.0, ans=0.2 2024-08-19 00:39:24,066 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 00:39:25,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4185870.0, ans=0.1 2024-08-19 00:39:30,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4185970.0, ans=0.125 2024-08-19 00:39:37,291 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4185970.0, ans=0.125 2024-08-19 00:39:39,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4185970.0, ans=0.0 2024-08-19 00:39:44,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3000, loss[loss=0.1078, beats_loss=0.01047, ecapa_loss=0.0001398, whisper_loss=0.09595, over 22221.00 frames. ], tot_loss[loss=0.103, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09115, over 3878956.70 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:39:44,472 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 00:40:22,176 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on ASR_libri: loss=0.2528, beats_loss=0, ecapa_loss=0.0005176, whisper_loss=0.2476, over 922467.00 frames. 2024-08-19 00:40:37,502 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on SV_voxceleb1: loss=0.004065, beats_loss=0, ecapa_loss=0.0004065, whisper_loss=0, over 939242.00 frames. 2024-08-19 00:40:55,099 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8910, 2.2516, 1.9052, 1.5783, 1.7710, 1.6643, 2.1155, 2.0030], device='cuda:3') 2024-08-19 00:42:03,030 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.6802, 2.0276, 2.2497, 1.6089, 1.8401, 2.4344, 2.9276, 1.8096], device='cuda:3') 2024-08-19 00:42:25,782 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 00:42:25,786 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 00:42:34,940 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 00:42:39,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4186170.0, ans=0.125 2024-08-19 00:42:44,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.286e+01 2.543e+01 2.791e+01 3.821e+01, threshold=5.086e+01, percent-clipped=0.0 2024-08-19 00:43:03,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4186370.0, ans=0.0 2024-08-19 00:43:13,475 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 00:43:14,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4186370.0, ans=0.125 2024-08-19 00:43:15,330 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.80 vs. limit=6.0 2024-08-19 00:43:17,053 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 00:43:21,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4186470.0, ans=0.125 2024-08-19 00:43:29,448 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3050, loss[loss=0.09165, beats_loss=0.01318, ecapa_loss=0.0001314, whisper_loss=0.07715, over 18817.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001419, whisper_loss=0.09166, over 3897045.56 frames. ], batch size: 77, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:43:38,517 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 00:43:50,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.05 vs. limit=15.0 2024-08-19 00:44:00,013 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 00:44:18,983 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 00:44:19,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4186970.0, ans=0.1 2024-08-19 00:44:25,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4186970.0, ans=0.125 2024-08-19 00:44:32,911 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3100, loss[loss=0.1142, beats_loss=0.01166, ecapa_loss=0.0001342, whisper_loss=0.1012, over 17532.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01042, ecapa_loss=0.0001421, whisper_loss=0.09191, over 3905770.58 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:44:33,120 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 00:44:37,068 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 00:44:41,958 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 00:44:43,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4187070.0, ans=0.2 2024-08-19 00:44:52,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.319e+01 2.529e+01 2.804e+01 4.634e+01, threshold=5.057e+01, percent-clipped=0.0 2024-08-19 00:45:10,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4187370.0, ans=0.125 2024-08-19 00:45:12,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4187370.0, ans=0.2 2024-08-19 00:45:12,197 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-08-19 00:45:29,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4187470.0, ans=0.125 2024-08-19 00:45:32,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4187470.0, ans=0.2 2024-08-19 00:45:36,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3150, loss[loss=0.1129, beats_loss=0.01119, ecapa_loss=0.0001307, whisper_loss=0.1004, over 18340.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01051, ecapa_loss=0.000142, whisper_loss=0.09147, over 3879668.54 frames. ], batch size: 69, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:45:37,259 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 00:45:42,132 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 00:45:54,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4187670.0, ans=0.0 2024-08-19 00:46:08,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2024-08-19 00:46:18,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4187870.0, ans=0.05 2024-08-19 00:46:32,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4187970.0, ans=0.1 2024-08-19 00:46:34,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4187970.0, ans=0.125 2024-08-19 00:46:37,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4187970.0, ans=0.0 2024-08-19 00:46:40,750 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3200, loss[loss=0.1433, beats_loss=0.007623, ecapa_loss=0.0001644, whisper_loss=0.1341, over 23277.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01051, ecapa_loss=0.0001423, whisper_loss=0.09166, over 3882973.16 frames. ], batch size: 89, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:46:59,733 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.847e+01 2.318e+01 2.522e+01 2.852e+01 4.136e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 00:47:08,825 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 00:47:15,054 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 00:47:16,390 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 00:47:20,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4188370.0, ans=0.1 2024-08-19 00:47:30,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4188470.0, ans=0.0 2024-08-19 00:47:33,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4188470.0, ans=0.0 2024-08-19 00:47:43,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3250, loss[loss=0.09688, beats_loss=0.01006, ecapa_loss=0.000125, whisper_loss=0.08557, over 20066.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.09187, over 3891788.67 frames. ], batch size: 79, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:48:00,377 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 00:48:16,250 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:48:27,338 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-19 00:48:30,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4188870.0, ans=0.07 2024-08-19 00:48:32,327 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-19 00:48:41,297 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 00:48:47,429 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3300, loss[loss=0.09342, beats_loss=0.01249, ecapa_loss=0.0001337, whisper_loss=0.0796, over 23019.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01048, ecapa_loss=0.0001415, whisper_loss=0.09194, over 3931084.69 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:48:55,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4189070.0, ans=0.1 2024-08-19 00:48:59,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-08-19 00:49:01,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4189170.0, ans=0.125 2024-08-19 00:49:06,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.329e+01 2.531e+01 2.845e+01 1.091e+02, threshold=5.061e+01, percent-clipped=2.0 2024-08-19 00:49:08,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4189170.0, ans=0.125 2024-08-19 00:49:10,710 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-08-19 00:49:11,353 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 00:49:15,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4189270.0, ans=0.125 2024-08-19 00:49:16,657 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 00:49:23,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4189270.0, ans=0.125 2024-08-19 00:49:25,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4189370.0, ans=0.125 2024-08-19 00:49:25,970 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2024-08-19 00:49:35,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4189370.0, ans=0.125 2024-08-19 00:49:38,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4189470.0, ans=0.09899494936611666 2024-08-19 00:49:45,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4189470.0, ans=0.125 2024-08-19 00:49:48,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4189470.0, ans=0.125 2024-08-19 00:49:50,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3350, loss[loss=0.1138, beats_loss=0.009838, ecapa_loss=0.0001414, whisper_loss=0.1026, over 20937.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01045, ecapa_loss=0.0001418, whisper_loss=0.09154, over 3910711.37 frames. ], batch size: 82, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:50:06,530 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4189670.0, ans=0.0 2024-08-19 00:50:36,966 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 00:50:38,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4189870.0, ans=0.125 2024-08-19 00:50:41,043 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 00:50:42,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4189970.0, ans=0.0 2024-08-19 00:50:43,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.92 vs. limit=12.0 2024-08-19 00:50:54,745 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3400, loss[loss=0.1125, beats_loss=0.008522, ecapa_loss=0.0001389, whisper_loss=0.1026, over 23147.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001417, whisper_loss=0.09025, over 3909896.89 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:51:13,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.286e+01 2.558e+01 2.999e+01 2.108e+02, threshold=5.116e+01, percent-clipped=4.0 2024-08-19 00:51:14,438 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=10.0 2024-08-19 00:51:19,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4190270.0, ans=0.125 2024-08-19 00:51:25,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2024-08-19 00:51:42,119 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-08-19 00:51:48,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4190470.0, ans=0.125 2024-08-19 00:51:59,239 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3450, loss[loss=0.1262, beats_loss=0.007617, ecapa_loss=0.0001839, whisper_loss=0.1168, over 18832.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01045, ecapa_loss=0.0001416, whisper_loss=0.08995, over 3861213.75 frames. ], batch size: 77, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:52:04,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4190570.0, ans=0.125 2024-08-19 00:52:06,700 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.09 vs. limit=15.0 2024-08-19 00:52:07,354 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 00:52:12,129 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 20 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-19 00:52:21,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4190670.0, ans=0.05 2024-08-19 00:52:23,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4190770.0, ans=0.2 2024-08-19 00:52:41,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4190870.0, ans=0.125 2024-08-19 00:52:56,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4190970.0, ans=0.0 2024-08-19 00:52:56,877 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-19 00:52:57,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4190970.0, ans=0.125 2024-08-19 00:53:03,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3500, loss[loss=0.09603, beats_loss=0.01135, ecapa_loss=0.0001322, whisper_loss=0.08335, over 17909.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001426, whisper_loss=0.09012, over 3862193.85 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:53:11,009 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 00:53:16,436 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 00:53:21,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4191170.0, ans=0.2 2024-08-19 00:53:22,264 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.660e+01 2.230e+01 2.489e+01 2.768e+01 5.626e+01, threshold=4.978e+01, percent-clipped=1.0 2024-08-19 00:54:00,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4191470.0, ans=0.2 2024-08-19 00:54:06,690 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3550, loss[loss=0.09673, beats_loss=0.01181, ecapa_loss=0.0001367, whisper_loss=0.08356, over 20436.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001431, whisper_loss=0.08974, over 3886308.21 frames. ], batch size: 84, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:54:09,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4191570.0, ans=0.0 2024-08-19 00:54:10,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4191570.0, ans=0.2 2024-08-19 00:54:27,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4191670.0, ans=0.125 2024-08-19 00:54:29,892 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 00:54:41,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4191770.0, ans=0.0 2024-08-19 00:55:07,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.61 vs. limit=10.0 2024-08-19 00:55:09,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4192070.0, ans=0.0 2024-08-19 00:55:10,336 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3600, loss[loss=0.1039, beats_loss=0.01056, ecapa_loss=0.0001348, whisper_loss=0.09195, over 17425.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001429, whisper_loss=0.0896, over 3893383.46 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:55:14,625 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 00:55:29,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.764e+01 2.283e+01 2.473e+01 2.802e+01 1.020e+02, threshold=4.947e+01, percent-clipped=3.0 2024-08-19 00:55:46,613 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4192270.0, ans=0.125 2024-08-19 00:55:56,181 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 00:56:11,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4192470.0, ans=0.2 2024-08-19 00:56:12,550 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 00:56:13,860 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 00:56:14,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3650, loss[loss=0.1077, beats_loss=0.01088, ecapa_loss=0.0001466, whisper_loss=0.09534, over 17770.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.000143, whisper_loss=0.08947, over 3859192.77 frames. ], batch size: 73, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:56:15,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4192570.0, ans=0.09899494936611666 2024-08-19 00:56:17,160 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-08-19 00:56:18,651 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 00:56:48,566 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2024-08-19 00:57:18,437 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2024-08-19 00:57:18,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3700, loss[loss=0.06965, beats_loss=0.01349, ecapa_loss=0.0001456, whisper_loss=0.0547, over 15705.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001428, whisper_loss=0.0895, over 3840892.94 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:57:20,513 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 00:57:25,512 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-19 00:57:30,538 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 00:57:35,646 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 17 from Vox, 20 fro AS 2024-08-19 00:57:38,287 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.353e+01 2.591e+01 2.874e+01 4.771e+02, threshold=5.181e+01, percent-clipped=2.0 2024-08-19 00:57:43,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4193270.0, ans=0.125 2024-08-19 00:57:44,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4193270.0, ans=0.2 2024-08-19 00:57:50,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2024-08-19 00:57:50,905 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 15 from Vox, 45 fro AS 2024-08-19 00:57:56,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4193370.0, ans=0.125 2024-08-19 00:57:59,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4193370.0, ans=0.025 2024-08-19 00:58:14,251 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 00:58:22,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3750, loss[loss=0.08726, beats_loss=0.01111, ecapa_loss=0.0001358, whisper_loss=0.07478, over 20410.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001422, whisper_loss=0.08932, over 3843161.13 frames. ], batch size: 83, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:58:23,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4193570.0, ans=0.0 2024-08-19 00:58:57,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4193770.0, ans=0.125 2024-08-19 00:59:01,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4193870.0, ans=0.125 2024-08-19 00:59:01,727 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2024-08-19 00:59:03,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-19 00:59:12,551 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 00:59:16,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4193970.0, ans=0.1 2024-08-19 00:59:22,071 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 00:59:22,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4193970.0, ans=0.1 2024-08-19 00:59:25,018 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 00:59:26,213 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 00:59:28,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3800, loss[loss=0.1162, beats_loss=0.01051, ecapa_loss=0.000129, whisper_loss=0.1044, over 23431.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001427, whisper_loss=0.0891, over 3846876.00 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 00:59:35,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4194070.0, ans=0.1 2024-08-19 00:59:38,828 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 00:59:48,902 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.244e+01 2.516e+01 2.813e+01 3.693e+01, threshold=5.032e+01, percent-clipped=0.0 2024-08-19 00:59:50,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4194170.0, ans=0.1 2024-08-19 01:00:01,085 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 01:00:07,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4194370.0, ans=0.125 2024-08-19 01:00:10,133 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 01:00:18,254 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 01:00:32,285 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4194470.0, ans=0.125 2024-08-19 01:00:34,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4194570.0, ans=0.125 2024-08-19 01:00:35,734 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3850, loss[loss=0.1174, beats_loss=0.008546, ecapa_loss=0.0001262, whisper_loss=0.1076, over 21036.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.0106, ecapa_loss=0.0001447, whisper_loss=0.08889, over 3859609.62 frames. ], batch size: 77, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:00:48,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4194670.0, ans=0.2 2024-08-19 01:00:53,985 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 26 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 01:01:03,977 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4194770.0, ans=0.125 2024-08-19 01:01:07,835 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:01:11,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4194770.0, ans=0.0 2024-08-19 01:01:25,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4194870.0, ans=0.125 2024-08-19 01:01:30,726 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 01:01:31,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2024-08-19 01:01:40,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4194970.0, ans=0.2 2024-08-19 01:01:43,815 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3900, loss[loss=0.1137, beats_loss=0.01138, ecapa_loss=0.0001435, whisper_loss=0.1009, over 23423.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001438, whisper_loss=0.0902, over 3874175.43 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:01:48,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4195070.0, ans=0.0 2024-08-19 01:01:57,045 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 32 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 01:02:03,756 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.347e+01 2.563e+01 2.946e+01 1.381e+02, threshold=5.126e+01, percent-clipped=1.0 2024-08-19 01:02:04,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4195170.0, ans=0.125 2024-08-19 01:02:12,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4195270.0, ans=0.0 2024-08-19 01:02:23,218 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 01:02:28,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4195370.0, ans=0.015 2024-08-19 01:02:28,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.37 vs. limit=15.0 2024-08-19 01:02:31,375 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4195370.0, ans=0.125 2024-08-19 01:02:41,627 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 01:02:50,773 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 3950, loss[loss=0.1108, beats_loss=0.009719, ecapa_loss=0.0001389, whisper_loss=0.09974, over 18862.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.0001451, whisper_loss=0.09064, over 3874276.82 frames. ], batch size: 71, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:02:51,174 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 01:03:01,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4195570.0, ans=0.0 2024-08-19 01:03:19,632 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 01:03:21,207 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 01:03:25,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4195770.0, ans=0.0 2024-08-19 01:03:47,827 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 11 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 01:03:54,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4195970.0, ans=0.125 2024-08-19 01:03:59,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4000, loss[loss=0.08378, beats_loss=0.01167, ecapa_loss=0.0001557, whisper_loss=0.07055, over 21844.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01036, ecapa_loss=0.000146, whisper_loss=0.09086, over 3855409.48 frames. ], batch size: 93, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:04:00,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4196070.0, ans=0.2 2024-08-19 01:04:08,339 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 01:04:14,693 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 12 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 01:04:18,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.259e+01 2.561e+01 2.838e+01 1.741e+02, threshold=5.122e+01, percent-clipped=1.0 2024-08-19 01:04:20,145 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 01:04:27,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4196270.0, ans=0.125 2024-08-19 01:04:30,574 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 01:04:37,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4196370.0, ans=0.07 2024-08-19 01:04:41,732 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 25 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 01:04:46,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4196370.0, ans=0.125 2024-08-19 01:05:05,294 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 22 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 01:05:06,390 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4050, loss[loss=0.1087, beats_loss=0.008252, ecapa_loss=0.0001136, whisper_loss=0.0993, over 16164.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01038, ecapa_loss=0.0001448, whisper_loss=0.09128, over 3852886.44 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:05:41,698 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.429e-02 2024-08-19 01:05:49,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4196870.0, ans=0.125 2024-08-19 01:05:53,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4196870.0, ans=0.0 2024-08-19 01:06:03,460 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 01:06:07,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4196970.0, ans=0.0 2024-08-19 01:06:11,901 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 26 from Vox, 22 fro AS 2024-08-19 01:06:14,412 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4100, loss[loss=0.113, beats_loss=0.0102, ecapa_loss=0.0001385, whisper_loss=0.1014, over 22578.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01043, ecapa_loss=0.0001445, whisper_loss=0.09124, over 3852382.34 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:06:24,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4197070.0, ans=0.125 2024-08-19 01:06:29,477 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 01:06:34,694 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.750e+01 2.255e+01 2.514e+01 2.841e+01 8.921e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-19 01:06:36,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4197170.0, ans=0.0 2024-08-19 01:06:39,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4197170.0, ans=0.2 2024-08-19 01:06:51,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4197270.0, ans=0.125 2024-08-19 01:07:07,288 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=12.0 2024-08-19 01:07:07,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4197370.0, ans=15.0 2024-08-19 01:07:23,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4197570.0, ans=0.1 2024-08-19 01:07:24,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4150, loss[loss=0.1007, beats_loss=0.01001, ecapa_loss=0.0001676, whisper_loss=0.08907, over 19857.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01047, ecapa_loss=0.0001442, whisper_loss=0.09134, over 3856864.03 frames. ], batch size: 84, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:07:30,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4197570.0, ans=0.0 2024-08-19 01:07:55,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4197770.0, ans=0.1 2024-08-19 01:07:59,527 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 01:08:12,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4197870.0, ans=0.125 2024-08-19 01:08:23,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.71 vs. limit=22.5 2024-08-19 01:08:32,645 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4200, loss[loss=0.06583, beats_loss=0.01192, ecapa_loss=0.0001678, whisper_loss=0.05224, over 14163.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001436, whisper_loss=0.09067, over 3867968.16 frames. ], batch size: 62, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:08:33,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2024-08-19 01:08:39,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4198070.0, ans=0.125 2024-08-19 01:08:52,132 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.638e+01 2.233e+01 2.469e+01 2.829e+01 1.799e+02, threshold=4.938e+01, percent-clipped=1.0 2024-08-19 01:09:00,820 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=22.5 2024-08-19 01:09:08,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4198270.0, ans=0.125 2024-08-19 01:09:11,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.37 vs. limit=6.0 2024-08-19 01:09:22,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4198370.0, ans=0.125 2024-08-19 01:09:38,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4250, loss[loss=0.1161, beats_loss=0.01124, ecapa_loss=0.0001184, whisper_loss=0.1037, over 22639.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0105, ecapa_loss=0.0001432, whisper_loss=0.09038, over 3883594.82 frames. ], batch size: 86, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:09:41,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4198570.0, ans=0.125 2024-08-19 01:10:00,496 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 01:10:11,197 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 16 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 01:10:15,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4198770.0, ans=0.2 2024-08-19 01:10:24,682 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 32 from Vox, 34 fro AS 2024-08-19 01:10:24,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4198870.0, ans=0.05 2024-08-19 01:10:35,719 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2024-08-19 01:10:38,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4198970.0, ans=0.125 2024-08-19 01:10:43,761 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4300, loss[loss=0.08241, beats_loss=0.009955, ecapa_loss=0.0001442, whisper_loss=0.07102, over 16086.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001439, whisper_loss=0.08955, over 3860700.46 frames. ], batch size: 65, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:11:03,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.233e+01 2.491e+01 2.683e+01 4.196e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-19 01:11:09,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4199270.0, ans=10.0 2024-08-19 01:11:14,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4199270.0, ans=0.2 2024-08-19 01:11:17,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.13 vs. limit=10.0 2024-08-19 01:11:35,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4199470.0, ans=0.0 2024-08-19 01:11:42,661 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 01:11:45,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4199470.0, ans=0.0 2024-08-19 01:11:48,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4350, loss[loss=0.1024, beats_loss=0.009994, ecapa_loss=0.0001615, whisper_loss=0.09077, over 21779.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001451, whisper_loss=0.09036, over 3868537.23 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:11:49,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4199570.0, ans=0.0 2024-08-19 01:11:51,468 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 01:11:53,119 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 39 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 01:12:00,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4199570.0, ans=0.125 2024-08-19 01:12:06,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4199670.0, ans=0.0 2024-08-19 01:12:33,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4199870.0, ans=0.0 2024-08-19 01:12:40,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-08-19 01:12:42,794 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 01:12:57,014 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.11 vs. limit=15.0 2024-08-19 01:12:57,538 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4400, loss[loss=0.1065, beats_loss=0.01203, ecapa_loss=0.0001172, whisper_loss=0.09333, over 23446.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001445, whisper_loss=0.09082, over 3870447.08 frames. ], batch size: 92, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:13:17,801 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-08-19 01:13:18,299 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.359e+01 2.700e+01 2.927e+01 4.297e+01, threshold=5.400e+01, percent-clipped=0.0 2024-08-19 01:13:32,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4200270.0, ans=0.0 2024-08-19 01:13:33,942 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 22 from Vox, 20 fro AS 2024-08-19 01:13:35,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4200270.0, ans=0.125 2024-08-19 01:13:46,332 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 01:13:54,111 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 01:14:05,797 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4450, loss[loss=0.1045, beats_loss=0.01092, ecapa_loss=0.0001412, whisper_loss=0.09213, over 22438.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001446, whisper_loss=0.09008, over 3845309.82 frames. ], batch size: 90, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:14:06,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4200570.0, ans=0.2 2024-08-19 01:14:13,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4200570.0, ans=0.125 2024-08-19 01:14:15,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4200570.0, ans=0.2 2024-08-19 01:14:25,327 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 01:14:37,125 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2024-08-19 01:14:40,714 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 01:14:45,311 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2024-08-19 01:14:46,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4200870.0, ans=0.125 2024-08-19 01:14:57,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4200870.0, ans=0.125 2024-08-19 01:15:13,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4500, loss[loss=0.09257, beats_loss=0.01267, ecapa_loss=0.0001454, whisper_loss=0.07844, over 19316.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001448, whisper_loss=0.09025, over 3839702.61 frames. ], batch size: 78, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:15:30,355 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 33 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 01:15:34,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.278e+01 2.479e+01 2.905e+01 4.149e+01, threshold=4.957e+01, percent-clipped=0.0 2024-08-19 01:15:56,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4201370.0, ans=0.125 2024-08-19 01:16:22,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4550, loss[loss=0.1019, beats_loss=0.009274, ecapa_loss=0.0001301, whisper_loss=0.09134, over 22678.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001448, whisper_loss=0.09021, over 3852906.59 frames. ], batch size: 87, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:16:27,199 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 01:16:28,289 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 01:16:30,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4201570.0, ans=0.125 2024-08-19 01:16:33,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4201570.0, ans=0.0 2024-08-19 01:16:35,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4201670.0, ans=0.05 2024-08-19 01:16:37,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4201670.0, ans=0.125 2024-08-19 01:16:49,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4201770.0, ans=0.025 2024-08-19 01:16:53,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4201770.0, ans=0.125 2024-08-19 01:17:07,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4201870.0, ans=0.125 2024-08-19 01:17:21,812 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 01:17:22,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4201970.0, ans=0.0 2024-08-19 01:17:31,564 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4600, loss[loss=0.08791, beats_loss=0.009576, ecapa_loss=0.0001842, whisper_loss=0.0765, over 14342.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01038, ecapa_loss=0.000145, whisper_loss=0.08997, over 3856651.07 frames. ], batch size: 62, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:17:34,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4202070.0, ans=0.1 2024-08-19 01:17:48,686 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 01:17:51,431 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 01:17:52,504 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.339e+01 2.630e+01 2.939e+01 5.094e+01, threshold=5.261e+01, percent-clipped=1.0 2024-08-19 01:17:57,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4202170.0, ans=0.0 2024-08-19 01:18:00,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4202270.0, ans=0.0 2024-08-19 01:18:06,217 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 01:18:11,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-08-19 01:18:19,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4202370.0, ans=0.2 2024-08-19 01:18:24,711 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 01:18:41,888 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4650, loss[loss=0.1066, beats_loss=0.007712, ecapa_loss=0.0001685, whisper_loss=0.0972, over 16890.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01046, ecapa_loss=0.0001445, whisper_loss=0.08967, over 3868840.27 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:18:50,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4202570.0, ans=0.125 2024-08-19 01:18:53,573 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 01:18:55,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 01:19:07,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.42 vs. limit=22.5 2024-08-19 01:19:11,028 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4202770.0, ans=0.125 2024-08-19 01:19:24,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=15.0 2024-08-19 01:19:32,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4202870.0, ans=0.0 2024-08-19 01:19:38,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4202970.0, ans=0.1 2024-08-19 01:19:53,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4700, loss[loss=0.1086, beats_loss=0.00957, ecapa_loss=0.0001185, whisper_loss=0.09789, over 19933.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001435, whisper_loss=0.08994, over 3876129.85 frames. ], batch size: 75, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:20:09,766 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-19 01:20:11,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4203170.0, ans=0.125 2024-08-19 01:20:13,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.364e+01 2.626e+01 2.952e+01 4.706e+01, threshold=5.252e+01, percent-clipped=0.0 2024-08-19 01:20:36,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4203370.0, ans=0.0 2024-08-19 01:20:58,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-08-19 01:21:01,436 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4750, loss[loss=0.1222, beats_loss=0.00831, ecapa_loss=0.0001892, whisper_loss=0.112, over 13529.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01041, ecapa_loss=0.0001435, whisper_loss=0.09025, over 3872131.84 frames. ], batch size: 53, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:21:03,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4203570.0, ans=0.0 2024-08-19 01:21:06,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4203570.0, ans=0.2 2024-08-19 01:21:14,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4203670.0, ans=0.125 2024-08-19 01:21:19,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.73 vs. limit=15.0 2024-08-19 01:21:24,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4203670.0, ans=0.125 2024-08-19 01:21:51,402 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2024-08-19 01:21:56,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4203970.0, ans=0.1 2024-08-19 01:22:00,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4203970.0, ans=0.2 2024-08-19 01:22:08,376 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4800, loss[loss=0.1182, beats_loss=0.01214, ecapa_loss=0.0001209, whisper_loss=0.1049, over 19924.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001442, whisper_loss=0.0901, over 3876503.62 frames. ], batch size: 79, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:22:09,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4204070.0, ans=0.125 2024-08-19 01:22:27,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.297e+01 2.510e+01 2.780e+01 4.241e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-19 01:22:32,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4204170.0, ans=0.125 2024-08-19 01:22:36,579 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 28 from Vox, 42 fro AS 2024-08-19 01:22:38,413 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.34 vs. limit=15.0 2024-08-19 01:22:44,254 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-08-19 01:22:59,165 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 31 from Vox, 27 fro AS 2024-08-19 01:23:05,154 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=4.33 vs. limit=12.0 2024-08-19 01:23:12,546 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 01:23:16,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4850, loss[loss=0.1217, beats_loss=0.008051, ecapa_loss=0.0001747, whisper_loss=0.1119, over 21551.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001444, whisper_loss=0.09025, over 3903127.21 frames. ], batch size: 86, lr: 2.12e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 01:23:30,882 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2024-08-19 01:23:40,090 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 01:23:46,215 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-08-19 01:24:26,356 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4900, loss[loss=0.08437, beats_loss=0.01343, ecapa_loss=0.0001287, whisper_loss=0.06965, over 18196.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01038, ecapa_loss=0.0001429, whisper_loss=0.09084, over 3919919.54 frames. ], batch size: 74, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:24:41,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4205170.0, ans=0.125 2024-08-19 01:24:43,687 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 01:24:48,353 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:24:49,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.329e+01 2.525e+01 2.915e+01 4.394e+02, threshold=5.050e+01, percent-clipped=2.0 2024-08-19 01:24:50,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4205170.0, ans=0.1 2024-08-19 01:24:51,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4205170.0, ans=0.1 2024-08-19 01:25:30,085 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 34 from LS+wenet, 18 from Vox, 20 fro AS 2024-08-19 01:25:31,742 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 01:25:36,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4205570.0, ans=0.125 2024-08-19 01:25:36,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 4950, loss[loss=0.1032, beats_loss=0.01006, ecapa_loss=0.0001552, whisper_loss=0.0916, over 21345.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01029, ecapa_loss=0.0001443, whisper_loss=0.09128, over 3870558.56 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:25:38,344 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 01:25:53,726 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.81 vs. limit=10.0 2024-08-19 01:26:05,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-08-19 01:26:13,542 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 01:26:19,330 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 01:26:36,258 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 01:26:38,614 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 01:26:46,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5000, loss[loss=0.1062, beats_loss=0.009839, ecapa_loss=0.0001244, whisper_loss=0.09508, over 17794.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01028, ecapa_loss=0.0001443, whisper_loss=0.09172, over 3876837.35 frames. ], batch size: 67, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:26:56,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4206070.0, ans=0.2 2024-08-19 01:27:07,008 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.303e+01 2.548e+01 2.762e+01 6.852e+01, threshold=5.096e+01, percent-clipped=1.0 2024-08-19 01:27:15,394 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 01:27:17,082 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 11 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 01:27:23,350 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2024-08-19 01:27:24,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4206270.0, ans=0.125 2024-08-19 01:27:50,735 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 01:27:55,593 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 01:27:59,240 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5050, loss[loss=0.09359, beats_loss=0.01149, ecapa_loss=0.0001487, whisper_loss=0.08061, over 21130.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001436, whisper_loss=0.09143, over 3896435.14 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:28:13,211 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 01:28:14,369 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 28 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 01:28:28,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4206770.0, ans=0.125 2024-08-19 01:28:44,205 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 22 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 01:28:49,631 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 01:28:51,837 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:29:01,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4206970.0, ans=0.125 2024-08-19 01:29:07,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4206970.0, ans=0.0 2024-08-19 01:29:12,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5100, loss[loss=0.1063, beats_loss=0.01136, ecapa_loss=0.0001207, whisper_loss=0.0937, over 18193.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01041, ecapa_loss=0.0001432, whisper_loss=0.09151, over 3863549.47 frames. ], batch size: 72, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:29:12,764 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 01:29:19,944 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 01:29:24,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4207070.0, ans=0.125 2024-08-19 01:29:26,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.98 vs. limit=22.5 2024-08-19 01:29:28,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4207170.0, ans=0.125 2024-08-19 01:29:33,760 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.980e+01 2.355e+01 2.571e+01 2.795e+01 7.505e+01, threshold=5.142e+01, percent-clipped=1.0 2024-08-19 01:30:11,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4207470.0, ans=15.0 2024-08-19 01:30:13,650 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 01:30:25,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5150, loss[loss=0.1031, beats_loss=0.01021, ecapa_loss=0.000149, whisper_loss=0.09142, over 17762.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01043, ecapa_loss=0.0001422, whisper_loss=0.09147, over 3869090.70 frames. ], batch size: 73, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:30:30,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=4207570.0, ans=0.2 2024-08-19 01:30:36,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4207570.0, ans=0.125 2024-08-19 01:30:42,702 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 01:30:48,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4207670.0, ans=0.2 2024-08-19 01:30:51,878 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 36 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 01:30:55,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4207770.0, ans=0.1 2024-08-19 01:31:02,012 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4207770.0, ans=0.2 2024-08-19 01:31:06,237 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 01:31:19,518 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4207870.0, ans=0.1 2024-08-19 01:31:27,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4207970.0, ans=0.1 2024-08-19 01:31:31,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4207970.0, ans=0.125 2024-08-19 01:31:37,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4207970.0, ans=0.05 2024-08-19 01:31:40,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5200, loss[loss=0.1411, beats_loss=0.007355, ecapa_loss=0.0001325, whisper_loss=0.1324, over 23507.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.09196, over 3860345.61 frames. ], batch size: 88, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:31:40,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4208070.0, ans=0.0 2024-08-19 01:31:43,820 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4208070.0, ans=0.0 2024-08-19 01:32:00,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.279e+01 2.485e+01 2.777e+01 3.905e+01, threshold=4.969e+01, percent-clipped=0.0 2024-08-19 01:32:00,884 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 01:32:03,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4208170.0, ans=0.0 2024-08-19 01:32:06,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4208170.0, ans=0.125 2024-08-19 01:32:26,172 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 01:32:32,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4208370.0, ans=0.125 2024-08-19 01:32:41,744 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 01:32:45,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4208470.0, ans=0.125 2024-08-19 01:32:53,284 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5250, loss[loss=0.07775, beats_loss=0.01304, ecapa_loss=0.000139, whisper_loss=0.06331, over 16440.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001413, whisper_loss=0.09083, over 3808288.20 frames. ], batch size: 68, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:33:08,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4208670.0, ans=0.125 2024-08-19 01:33:25,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4208770.0, ans=0.125 2024-08-19 01:33:29,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4208770.0, ans=0.0 2024-08-19 01:33:37,098 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 01:33:39,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4208870.0, ans=0.125 2024-08-19 01:33:39,117 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:33:50,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4208870.0, ans=0.2 2024-08-19 01:33:51,752 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 01:33:58,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4208970.0, ans=0.125 2024-08-19 01:34:01,293 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 01:34:01,965 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4208970.0, ans=0.125 2024-08-19 01:34:04,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4208970.0, ans=0.2 2024-08-19 01:34:10,178 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5300, loss[loss=0.07834, beats_loss=0.01044, ecapa_loss=0.0001453, whisper_loss=0.06645, over 14775.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001423, whisper_loss=0.09084, over 3832720.02 frames. ], batch size: 58, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:34:11,618 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 01:34:31,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4209170.0, ans=0.125 2024-08-19 01:34:32,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.920e+01 2.347e+01 2.623e+01 3.004e+01 4.261e+01, threshold=5.246e+01, percent-clipped=0.0 2024-08-19 01:34:56,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4209370.0, ans=0.125 2024-08-19 01:35:02,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4209370.0, ans=0.1 2024-08-19 01:35:07,011 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 01:35:12,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4209470.0, ans=0.125 2024-08-19 01:35:28,128 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5350, loss[loss=0.0927, beats_loss=0.01307, ecapa_loss=9.651e-05, whisper_loss=0.07866, over 22904.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001412, whisper_loss=0.09055, over 3823776.98 frames. ], batch size: 91, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:35:34,381 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 25 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 01:35:54,344 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 01:35:56,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4209670.0, ans=0.0 2024-08-19 01:36:05,584 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.509e-01 2024-08-19 01:36:32,383 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 01:36:47,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4210070.0, ans=0.125 2024-08-19 01:36:48,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5400, loss[loss=0.08271, beats_loss=0.01237, ecapa_loss=0.000135, whisper_loss=0.06899, over 19207.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.09053, over 3830276.61 frames. ], batch size: 80, lr: 2.12e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:36:55,630 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 21 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 01:37:08,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4210170.0, ans=0.0 2024-08-19 01:37:09,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4210170.0, ans=0.0 2024-08-19 01:37:11,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.661e+01 2.258e+01 2.667e+01 2.927e+01 2.051e+02, threshold=5.334e+01, percent-clipped=3.0 2024-08-19 01:37:18,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4210270.0, ans=0.1 2024-08-19 01:37:19,564 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 01:37:21,072 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 01:37:30,409 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 01:37:30,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4210270.0, ans=0.0 2024-08-19 01:37:36,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4210370.0, ans=0.125 2024-08-19 01:37:53,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4210470.0, ans=0.125 2024-08-19 01:37:53,933 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 01:37:55,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4210470.0, ans=0.2 2024-08-19 01:38:08,052 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5450, loss[loss=0.1222, beats_loss=0.008104, ecapa_loss=0.0001335, whisper_loss=0.1127, over 19094.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01042, ecapa_loss=0.0001421, whisper_loss=0.09135, over 3839044.01 frames. ], batch size: 74, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:38:44,096 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 01:38:53,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4210870.0, ans=0.125 2024-08-19 01:38:56,692 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 01:39:13,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4210970.0, ans=0.125 2024-08-19 01:39:17,538 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 01:39:21,322 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5500, loss[loss=0.1154, beats_loss=0.01026, ecapa_loss=0.0001527, whisper_loss=0.1036, over 21747.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01047, ecapa_loss=0.0001416, whisper_loss=0.09105, over 3860412.45 frames. ], batch size: 85, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:39:36,380 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 01:39:43,072 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.049e+01 2.383e+01 2.517e+01 2.787e+01 3.399e+01, threshold=5.035e+01, percent-clipped=0.0 2024-08-19 01:39:51,671 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 13 from LS+wenet, 23 from Vox, 54 fro AS 2024-08-19 01:39:57,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4211270.0, ans=0.0 2024-08-19 01:40:12,720 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4211370.0, ans=0.0 2024-08-19 01:40:16,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4211470.0, ans=0.125 2024-08-19 01:40:22,141 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=12.0 2024-08-19 01:40:23,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4211470.0, ans=0.125 2024-08-19 01:40:30,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5550, loss[loss=0.1235, beats_loss=0.008584, ecapa_loss=0.0001783, whisper_loss=0.1132, over 22073.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001421, whisper_loss=0.09051, over 3894905.00 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:40:31,024 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4211570.0, ans=0.1 2024-08-19 01:40:36,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4211570.0, ans=0.125 2024-08-19 01:40:39,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4211570.0, ans=0.0 2024-08-19 01:40:41,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4211570.0, ans=0.035 2024-08-19 01:40:50,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4211670.0, ans=0.125 2024-08-19 01:40:53,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4211670.0, ans=0.5 2024-08-19 01:41:06,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.50 vs. limit=15.0 2024-08-19 01:41:06,822 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 01:41:14,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4211870.0, ans=0.125 2024-08-19 01:41:30,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4211970.0, ans=0.5 2024-08-19 01:41:31,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4211970.0, ans=0.2 2024-08-19 01:41:36,432 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5600, loss[loss=0.115, beats_loss=0.009898, ecapa_loss=0.0001221, whisper_loss=0.1039, over 22675.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0106, ecapa_loss=0.0001411, whisper_loss=0.08968, over 3906769.60 frames. ], batch size: 87, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:41:39,686 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.21 vs. limit=22.5 2024-08-19 01:41:46,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4212070.0, ans=0.125 2024-08-19 01:41:56,850 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.373e+01 2.558e+01 2.737e+01 3.942e+01, threshold=5.116e+01, percent-clipped=0.0 2024-08-19 01:42:07,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4212270.0, ans=0.1 2024-08-19 01:42:26,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4212370.0, ans=0.0 2024-08-19 01:42:37,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4212470.0, ans=0.1 2024-08-19 01:42:37,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4212470.0, ans=0.1 2024-08-19 01:42:43,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5650, loss[loss=0.07912, beats_loss=0.01361, ecapa_loss=0.0001012, whisper_loss=0.0645, over 14128.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001411, whisper_loss=0.08951, over 3903727.87 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:42:52,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4212570.0, ans=0.2 2024-08-19 01:42:56,325 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2024-08-19 01:43:05,321 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 01:43:08,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4212670.0, ans=0.0 2024-08-19 01:43:09,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4212770.0, ans=0.125 2024-08-19 01:43:25,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4212870.0, ans=0.125 2024-08-19 01:43:28,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4212870.0, ans=0.1 2024-08-19 01:43:31,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4212870.0, ans=0.1 2024-08-19 01:43:35,178 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.01 vs. limit=22.5 2024-08-19 01:43:42,294 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 01:43:51,631 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5700, loss[loss=0.08302, beats_loss=0.01203, ecapa_loss=0.0001355, whisper_loss=0.06964, over 16393.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01062, ecapa_loss=0.0001417, whisper_loss=0.08942, over 3942477.78 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:43:55,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.10 vs. limit=15.0 2024-08-19 01:44:07,861 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 01:44:12,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4213170.0, ans=0.0 2024-08-19 01:44:12,713 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.394e+01 2.632e+01 3.086e+01 9.254e+01, threshold=5.264e+01, percent-clipped=1.0 2024-08-19 01:44:13,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4213170.0, ans=0.125 2024-08-19 01:44:17,922 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 01:44:19,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4213270.0, ans=0.0 2024-08-19 01:44:38,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4213370.0, ans=0.125 2024-08-19 01:44:49,664 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:45:01,725 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5750, loss[loss=0.1176, beats_loss=0.01203, ecapa_loss=0.0001294, whisper_loss=0.1043, over 23309.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01061, ecapa_loss=0.0001416, whisper_loss=0.0896, over 3913772.26 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:45:03,565 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 01:45:06,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4213570.0, ans=0.125 2024-08-19 01:45:14,235 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 01:45:14,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4213670.0, ans=0.125 2024-08-19 01:45:14,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4213670.0, ans=0.1 2024-08-19 01:45:21,976 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 01:45:44,906 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 01:45:53,913 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 01:46:13,290 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5800, loss[loss=0.08474, beats_loss=0.01199, ecapa_loss=0.0001596, whisper_loss=0.07115, over 21464.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01054, ecapa_loss=0.0001422, whisper_loss=0.08942, over 3861097.26 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:46:35,940 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.316e+01 2.608e+01 2.916e+01 4.627e+01, threshold=5.217e+01, percent-clipped=0.0 2024-08-19 01:46:36,429 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4214170.0, ans=0.1 2024-08-19 01:46:37,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.02 vs. limit=12.0 2024-08-19 01:46:41,725 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 01:46:59,337 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 01:47:07,187 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 01:47:07,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4214370.0, ans=0.2 2024-08-19 01:47:10,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4214470.0, ans=0.025 2024-08-19 01:47:19,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4214470.0, ans=0.07 2024-08-19 01:47:19,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4214470.0, ans=10.0 2024-08-19 01:47:24,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5850, loss[loss=0.105, beats_loss=0.01025, ecapa_loss=0.0001628, whisper_loss=0.09308, over 21359.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001424, whisper_loss=0.08937, over 3864174.52 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:47:25,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4214570.0, ans=0.2 2024-08-19 01:47:33,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4214570.0, ans=0.1 2024-08-19 01:47:38,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.76 vs. limit=10.0 2024-08-19 01:47:43,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4214670.0, ans=0.1 2024-08-19 01:47:54,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4214770.0, ans=0.0 2024-08-19 01:47:59,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4214770.0, ans=0.125 2024-08-19 01:48:00,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4214770.0, ans=0.125 2024-08-19 01:48:17,508 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 01:48:36,017 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 01:48:37,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5900, loss[loss=0.09443, beats_loss=0.01153, ecapa_loss=0.0001727, whisper_loss=0.08118, over 18402.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001419, whisper_loss=0.08983, over 3891655.40 frames. ], batch size: 76, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:48:41,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4215070.0, ans=0.2 2024-08-19 01:48:46,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4215070.0, ans=0.0 2024-08-19 01:48:50,132 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 16 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 01:48:57,705 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.293e+01 2.484e+01 2.776e+01 5.070e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 01:48:59,235 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 38 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 01:49:11,003 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 01:49:21,555 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 01:49:37,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4215470.0, ans=0.125 2024-08-19 01:49:51,560 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 5950, loss[loss=0.1035, beats_loss=0.01175, ecapa_loss=0.0001724, whisper_loss=0.08999, over 14742.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01058, ecapa_loss=0.0001421, whisper_loss=0.08916, over 3887636.10 frames. ], batch size: 63, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:50:11,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4215670.0, ans=0.125 2024-08-19 01:50:15,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2024-08-19 01:50:27,823 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.79 vs. limit=10.0 2024-08-19 01:50:33,202 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 01:50:34,633 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 01:50:36,189 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 01:50:51,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2024-08-19 01:51:08,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4215870.0, ans=0.5 2024-08-19 01:51:09,658 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 01:51:09,739 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4215970.0, ans=0.125 2024-08-19 01:51:12,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4215970.0, ans=0.0 2024-08-19 01:51:21,908 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 01:51:24,388 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6000, loss[loss=0.1023, beats_loss=0.01112, ecapa_loss=0.0001388, whisper_loss=0.08983, over 22034.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001422, whisper_loss=0.09002, over 3904308.04 frames. ], batch size: 87, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:51:24,388 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 01:52:05,759 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([1.5778, 1.4798, 2.6135, 2.5380], device='cuda:3') 2024-08-19 01:52:18,338 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on ASR_libri: loss=0.2515, beats_loss=0, ecapa_loss=0.0005229, whisper_loss=0.2463, over 922467.00 frames. 2024-08-19 01:52:36,623 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on SV_voxceleb1: loss=0.003944, beats_loss=0, ecapa_loss=0.0003944, whisper_loss=0, over 939242.00 frames. 2024-08-19 01:54:14,335 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1579, 2.5866, 2.4958, 2.3362], device='cuda:3') 2024-08-19 01:55:19,279 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on AT_audioset: loss=0.02306, beats_loss=0.02306, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 01:55:19,283 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 01:55:45,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4216170.0, ans=0.07 2024-08-19 01:55:48,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.845e+01 2.345e+01 2.612e+01 2.923e+01 8.240e+01, threshold=5.224e+01, percent-clipped=1.0 2024-08-19 01:55:52,074 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 12 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 01:55:55,136 WARNING [optim.py:496] (3/4) Scaling gradients by 0.028258753940463066, model_norm_threshold=52.240760803222656 2024-08-19 01:55:55,310 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.18, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=6.066e+05, grad_sumsq=6.066e+05, orig_rms_sq=1.000e+00 2024-08-19 01:55:56,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4216270.0, ans=0.1 2024-08-19 01:56:06,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2024-08-19 01:56:22,219 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 01:56:35,600 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 01:56:56,739 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6050, loss[loss=0.1072, beats_loss=0.01006, ecapa_loss=0.0001592, whisper_loss=0.09556, over 14995.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01061, ecapa_loss=0.000143, whisper_loss=0.08996, over 3866839.98 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:56:58,448 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 01:56:59,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4216570.0, ans=0.0 2024-08-19 01:57:07,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4216570.0, ans=0.125 2024-08-19 01:57:10,184 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 01:57:19,094 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-08-19 01:57:25,635 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 01:57:35,213 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 01:57:38,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.31 vs. limit=15.0 2024-08-19 01:57:44,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4216870.0, ans=0.0 2024-08-19 01:58:00,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.08 vs. limit=15.0 2024-08-19 01:58:11,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6100, loss[loss=0.1012, beats_loss=0.01104, ecapa_loss=0.0001607, whisper_loss=0.08856, over 21264.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01066, ecapa_loss=0.000142, whisper_loss=0.08979, over 3891992.35 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:58:22,992 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 01:58:27,171 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 01:58:31,662 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.816e+01 2.328e+01 2.727e+01 2.996e+01 1.849e+03, threshold=5.454e+01, percent-clipped=1.0 2024-08-19 01:58:53,883 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 01:59:05,246 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-08-19 01:59:17,029 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 01:59:19,528 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6150, loss[loss=0.1031, beats_loss=0.009, ecapa_loss=0.0001579, whisper_loss=0.09255, over 19923.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001412, whisper_loss=0.08967, over 3910464.44 frames. ], batch size: 80, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 01:59:23,806 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 01:59:25,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4217570.0, ans=0.125 2024-08-19 01:59:36,370 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 01:59:57,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4217770.0, ans=0.2 2024-08-19 02:00:15,988 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 02:00:29,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6200, loss[loss=0.09857, beats_loss=0.01122, ecapa_loss=0.0001075, whisper_loss=0.08627, over 14519.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01057, ecapa_loss=0.0001421, whisper_loss=0.09026, over 3888677.13 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:00:40,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4218070.0, ans=0.0 2024-08-19 02:00:48,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4218170.0, ans=0.125 2024-08-19 02:00:49,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4218170.0, ans=0.125 2024-08-19 02:00:50,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.664e+01 2.249e+01 2.459e+01 2.825e+01 3.741e+01, threshold=4.918e+01, percent-clipped=0.0 2024-08-19 02:01:06,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4218270.0, ans=0.1 2024-08-19 02:01:12,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4218370.0, ans=0.1 2024-08-19 02:01:13,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2024-08-19 02:01:19,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4218370.0, ans=0.125 2024-08-19 02:01:34,050 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 02:01:34,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4218470.0, ans=0.0 2024-08-19 02:01:40,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6250, loss[loss=0.09839, beats_loss=0.009181, ecapa_loss=0.0001397, whisper_loss=0.08781, over 20958.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001427, whisper_loss=0.09054, over 3913489.76 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:01:59,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4218670.0, ans=0.0 2024-08-19 02:02:06,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4218670.0, ans=0.035 2024-08-19 02:02:27,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4218870.0, ans=0.2 2024-08-19 02:02:32,457 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 10 from Vox, 24 fro AS 2024-08-19 02:02:38,384 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 02:02:50,741 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6300, loss[loss=0.09776, beats_loss=0.01063, ecapa_loss=0.0001551, whisper_loss=0.08558, over 16788.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.0001433, whisper_loss=0.09086, over 3876733.86 frames. ], batch size: 71, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:02:53,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4219070.0, ans=0.125 2024-08-19 02:03:08,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4219170.0, ans=0.125 2024-08-19 02:03:11,743 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.391e+01 2.571e+01 3.038e+01 7.293e+01, threshold=5.142e+01, percent-clipped=2.0 2024-08-19 02:03:13,442 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:03:29,835 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 23 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 02:03:30,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4219270.0, ans=0.125 2024-08-19 02:03:33,074 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=4.270e+01 2024-08-19 02:03:35,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4219370.0, ans=0.0 2024-08-19 02:03:37,820 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 02:03:42,099 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 02:03:51,248 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 02:03:53,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4219470.0, ans=0.05 2024-08-19 02:03:55,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2024-08-19 02:03:59,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6350, loss[loss=0.1058, beats_loss=0.01049, ecapa_loss=0.0001508, whisper_loss=0.09379, over 22264.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.08998, over 3873738.52 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:04:06,142 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 02:04:21,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4219670.0, ans=0.125 2024-08-19 02:04:23,226 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=15.31 vs. limit=15.0 2024-08-19 02:04:39,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4219770.0, ans=0.125 2024-08-19 02:04:42,913 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:04:43,925 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 38 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 02:04:44,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4219870.0, ans=0.125 2024-08-19 02:04:45,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4219870.0, ans=0.2 2024-08-19 02:04:45,899 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=15.0 2024-08-19 02:04:58,000 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 02:05:04,102 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 28 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 02:05:09,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6400, loss[loss=0.1072, beats_loss=0.01046, ecapa_loss=0.0001125, whisper_loss=0.09564, over 23614.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001414, whisper_loss=0.0898, over 3890280.16 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:05:12,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4220070.0, ans=0.0 2024-08-19 02:05:24,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4220170.0, ans=0.125 2024-08-19 02:05:26,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2024-08-19 02:05:31,204 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.582e+01 2.283e+01 2.522e+01 2.730e+01 4.061e+01, threshold=5.044e+01, percent-clipped=0.0 2024-08-19 02:05:34,033 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 02:05:40,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4220270.0, ans=0.0 2024-08-19 02:06:04,591 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.18 vs. limit=22.5 2024-08-19 02:06:19,670 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6450, loss[loss=0.1099, beats_loss=0.009936, ecapa_loss=0.0001371, whisper_loss=0.09861, over 19912.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01052, ecapa_loss=0.0001411, whisper_loss=0.09063, over 3923884.37 frames. ], batch size: 76, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:06:23,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=6.47 vs. limit=12.0 2024-08-19 02:06:44,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4220670.0, ans=0.125 2024-08-19 02:06:57,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4220770.0, ans=0.0 2024-08-19 02:07:03,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4220870.0, ans=0.1 2024-08-19 02:07:08,622 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 02:07:29,974 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6500, loss[loss=0.08739, beats_loss=0.01019, ecapa_loss=0.0001732, whisper_loss=0.07547, over 18188.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001417, whisper_loss=0.09067, over 3893862.77 frames. ], batch size: 77, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:07:39,984 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 26 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 02:07:40,453 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4221070.0, ans=0.125 2024-08-19 02:07:40,791 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.94 vs. limit=10.0 2024-08-19 02:07:50,819 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.411e+01 2.588e+01 2.957e+01 3.943e+01, threshold=5.175e+01, percent-clipped=0.0 2024-08-19 02:07:52,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4221170.0, ans=0.0 2024-08-19 02:08:11,126 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 02:08:13,531 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 02:08:38,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6550, loss[loss=0.1201, beats_loss=0.0106, ecapa_loss=0.0001519, whisper_loss=0.108, over 22387.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01047, ecapa_loss=0.0001431, whisper_loss=0.09143, over 3905099.03 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:08:47,087 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 02:08:53,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4221670.0, ans=0.125 2024-08-19 02:09:00,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4221670.0, ans=0.0 2024-08-19 02:09:02,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.37 vs. limit=22.5 2024-08-19 02:09:08,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=12.0 2024-08-19 02:09:49,329 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6600, loss[loss=0.101, beats_loss=0.01157, ecapa_loss=0.0001085, whisper_loss=0.0883, over 15667.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01039, ecapa_loss=0.0001437, whisper_loss=0.09198, over 3926857.12 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:10:04,957 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 02:10:09,809 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.434e+01 2.631e+01 2.972e+01 4.626e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-19 02:10:10,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.31 vs. limit=12.0 2024-08-19 02:10:33,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-08-19 02:10:58,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6650, loss[loss=0.09559, beats_loss=0.01051, ecapa_loss=0.0001655, whisper_loss=0.08343, over 22598.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01048, ecapa_loss=0.0001429, whisper_loss=0.09149, over 3944654.10 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:11:03,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4222570.0, ans=0.015 2024-08-19 02:11:10,284 WARNING [optim.py:496] (3/4) Scaling gradients by 0.04424963891506195, model_norm_threshold=52.611846923828125 2024-08-19 02:11:10,454 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.224e+05, grad_sumsq=2.136e+07, orig_rms_sq=1.041e-02 2024-08-19 02:11:21,420 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 02:11:33,520 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 18 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 02:11:38,719 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 02:11:42,282 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-19 02:11:57,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4222970.0, ans=0.1 2024-08-19 02:12:05,641 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 02:12:09,159 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6700, loss[loss=0.09255, beats_loss=0.01143, ecapa_loss=0.00012, whisper_loss=0.07992, over 18018.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01045, ecapa_loss=0.0001437, whisper_loss=0.09136, over 3944425.21 frames. ], batch size: 70, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:12:31,459 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.954e+01 2.376e+01 2.691e+01 2.989e+01 1.189e+03, threshold=5.381e+01, percent-clipped=5.0 2024-08-19 02:12:37,723 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 02:12:51,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4223370.0, ans=0.2 2024-08-19 02:12:58,233 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 02:13:16,824 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-19 02:13:20,618 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6750, loss[loss=0.09724, beats_loss=0.01046, ecapa_loss=0.0001645, whisper_loss=0.08513, over 20643.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01043, ecapa_loss=0.0001433, whisper_loss=0.0914, over 3937221.50 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:13:27,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2024-08-19 02:13:41,419 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 02:13:46,663 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 02:13:49,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4223770.0, ans=0.05 2024-08-19 02:14:03,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4223870.0, ans=0.0 2024-08-19 02:14:20,714 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 02:14:28,904 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6800, loss[loss=0.08164, beats_loss=0.01006, ecapa_loss=0.0001546, whisper_loss=0.07003, over 21810.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.0001441, whisper_loss=0.09067, over 3914343.15 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:14:29,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4224070.0, ans=0.125 2024-08-19 02:14:49,982 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.425e+01 2.571e+01 2.858e+01 3.712e+02, threshold=5.143e+01, percent-clipped=3.0 2024-08-19 02:14:53,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4224170.0, ans=0.2 2024-08-19 02:15:17,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4224370.0, ans=0.125 2024-08-19 02:15:20,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4224370.0, ans=0.0 2024-08-19 02:15:35,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4224470.0, ans=0.125 2024-08-19 02:15:37,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6850, loss[loss=0.1032, beats_loss=0.011, ecapa_loss=0.0001211, whisper_loss=0.09096, over 16836.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001437, whisper_loss=0.09084, over 3885488.54 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:15:40,026 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 02:15:41,525 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:15:48,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4224570.0, ans=0.0 2024-08-19 02:16:01,257 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 02:16:06,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4224770.0, ans=0.0 2024-08-19 02:16:09,735 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4224770.0, ans=0.0 2024-08-19 02:16:18,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4224870.0, ans=0.125 2024-08-19 02:16:18,417 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2024-08-19 02:16:46,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6900, loss[loss=0.1152, beats_loss=0.00778, ecapa_loss=0.0001663, whisper_loss=0.1057, over 19792.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001427, whisper_loss=0.09036, over 3892695.08 frames. ], batch size: 76, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:16:49,627 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.577e+01 2024-08-19 02:17:04,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4225170.0, ans=0.5 2024-08-19 02:17:06,152 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.295e+01 2.515e+01 2.694e+01 1.143e+02, threshold=5.030e+01, percent-clipped=2.0 2024-08-19 02:17:18,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4225270.0, ans=0.1 2024-08-19 02:17:23,710 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 02:17:47,654 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 02:17:52,722 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 6950, loss[loss=0.1051, beats_loss=0.0116, ecapa_loss=0.0001024, whisper_loss=0.09249, over 22790.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001416, whisper_loss=0.09059, over 3888850.27 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:17:55,595 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 02:18:09,634 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:18:23,302 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2024-08-19 02:18:27,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4225770.0, ans=0.125 2024-08-19 02:18:28,494 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-19 02:18:38,191 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 02:18:49,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4225970.0, ans=0.125 2024-08-19 02:18:58,850 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7000, loss[loss=0.1044, beats_loss=0.01039, ecapa_loss=0.0001122, whisper_loss=0.09286, over 17585.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001431, whisper_loss=0.09072, over 3891331.75 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:19:18,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.286e+01 2.533e+01 2.808e+01 4.798e+01, threshold=5.066e+01, percent-clipped=0.0 2024-08-19 02:19:39,047 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4226370.0, ans=0.2 2024-08-19 02:19:42,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4226370.0, ans=0.0 2024-08-19 02:19:43,611 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 02:20:02,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7050, loss[loss=0.08821, beats_loss=0.01055, ecapa_loss=0.0001694, whisper_loss=0.07596, over 20639.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001438, whisper_loss=0.09043, over 3878092.85 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:20:25,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.33 vs. limit=12.0 2024-08-19 02:20:28,492 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4226770.0, ans=0.125 2024-08-19 02:20:38,453 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 40 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 02:20:43,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4226870.0, ans=0.1 2024-08-19 02:20:49,514 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 30 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 02:21:04,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4227070.0, ans=0.125 2024-08-19 02:21:05,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7100, loss[loss=0.1162, beats_loss=0.008004, ecapa_loss=0.0001759, whisper_loss=0.1064, over 21290.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0105, ecapa_loss=0.0001436, whisper_loss=0.09026, over 3865907.86 frames. ], batch size: 87, lr: 2.11e-03, grad_scale: 1.152921504606847e+18 2024-08-19 02:21:05,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4227070.0, ans=0.0 2024-08-19 02:21:15,573 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 02:21:23,810 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.314e+01 2.618e+01 2.953e+01 5.824e+01, threshold=5.237e+01, percent-clipped=1.0 2024-08-19 02:21:36,922 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 02:21:37,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4227270.0, ans=0.0 2024-08-19 02:21:54,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4227370.0, ans=0.125 2024-08-19 02:22:08,102 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7150, loss[loss=0.08862, beats_loss=0.01241, ecapa_loss=0.00016, whisper_loss=0.07461, over 21566.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001436, whisper_loss=0.09045, over 3872332.45 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:22:15,903 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 25 from Vox, 20 fro AS 2024-08-19 02:22:17,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4227570.0, ans=0.2 2024-08-19 02:22:27,443 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-19 02:22:43,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4227770.0, ans=0.2 2024-08-19 02:22:43,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4227770.0, ans=0.125 2024-08-19 02:22:51,907 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4227870.0, ans=0.125 2024-08-19 02:23:01,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4227970.0, ans=0.0 2024-08-19 02:23:06,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-08-19 02:23:08,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-08-19 02:23:09,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4227970.0, ans=0.5 2024-08-19 02:23:11,469 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7200, loss[loss=0.1014, beats_loss=0.01078, ecapa_loss=0.0001646, whisper_loss=0.08901, over 22479.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.000144, whisper_loss=0.09095, over 3907890.38 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:23:15,288 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 02:23:17,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4228070.0, ans=0.125 2024-08-19 02:23:25,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4228170.0, ans=0.0 2024-08-19 02:23:31,508 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.350e+01 2.585e+01 2.924e+01 4.669e+01, threshold=5.169e+01, percent-clipped=0.0 2024-08-19 02:23:39,311 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:23:52,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2024-08-19 02:24:01,882 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.088e+05 2024-08-19 02:24:08,169 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4228470.0, ans=0.1 2024-08-19 02:24:13,261 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-08-19 02:24:13,879 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7250, loss[loss=0.0784, beats_loss=0.01326, ecapa_loss=0.0001421, whisper_loss=0.06372, over 20583.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001441, whisper_loss=0.08981, over 3936481.46 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:24:39,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4228770.0, ans=0.0 2024-08-19 02:24:44,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4228770.0, ans=0.125 2024-08-19 02:25:00,560 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.37 vs. limit=15.0 2024-08-19 02:25:01,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4228870.0, ans=0.2 2024-08-19 02:25:02,159 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0587669275701046, model_norm_threshold=51.69389343261719 2024-08-19 02:25:02,326 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.3.encoder.layers.3.norm.log_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=8.897e+04, grad_sumsq=8.897e+04, orig_rms_sq=1.000e+00 2024-08-19 02:25:04,763 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2024-08-19 02:25:18,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7300, loss[loss=0.1084, beats_loss=0.01238, ecapa_loss=0.0001158, whisper_loss=0.09485, over 22603.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0108, ecapa_loss=0.0001431, whisper_loss=0.0898, over 3935835.07 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:25:24,326 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 02:25:31,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4229170.0, ans=0.2 2024-08-19 02:25:32,926 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.86 vs. limit=6.0 2024-08-19 02:25:33,261 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 02:25:38,108 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.817e+01 2.307e+01 2.519e+01 2.780e+01 8.796e+02, threshold=5.038e+01, percent-clipped=1.0 2024-08-19 02:25:45,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4229270.0, ans=0.125 2024-08-19 02:25:49,635 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 02:26:07,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4229470.0, ans=0.0 2024-08-19 02:26:07,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4229470.0, ans=0.2 2024-08-19 02:26:19,575 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 17 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 02:26:20,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7350, loss[loss=0.07548, beats_loss=0.01173, ecapa_loss=0.0001458, whisper_loss=0.06229, over 17775.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01075, ecapa_loss=0.0001418, whisper_loss=0.08957, over 3923151.98 frames. ], batch size: 76, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:26:34,234 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4229670.0, ans=0.0 2024-08-19 02:26:43,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4229670.0, ans=0.0 2024-08-19 02:26:53,098 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 11 from Vox, 29 fro AS 2024-08-19 02:26:57,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4229770.0, ans=0.125 2024-08-19 02:26:57,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4229770.0, ans=0.0 2024-08-19 02:27:06,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4229870.0, ans=0.2 2024-08-19 02:27:10,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4229870.0, ans=0.125 2024-08-19 02:27:24,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4230070.0, ans=0.0 2024-08-19 02:27:25,397 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7400, loss[loss=0.06978, beats_loss=0.01243, ecapa_loss=0.0001374, whisper_loss=0.05598, over 16058.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001431, whisper_loss=0.08947, over 3850110.51 frames. ], batch size: 65, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:27:29,485 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 38 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 02:27:33,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4230070.0, ans=0.125 2024-08-19 02:27:46,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.315e+01 2.515e+01 2.740e+01 4.360e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 02:27:46,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4230170.0, ans=0.0 2024-08-19 02:27:53,931 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 02:27:57,683 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 02:28:12,852 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 02:28:13,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4230370.0, ans=0.125 2024-08-19 02:28:19,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4230470.0, ans=0.125 2024-08-19 02:28:19,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-08-19 02:28:29,073 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7450, loss[loss=0.08984, beats_loss=0.01238, ecapa_loss=0.0001172, whisper_loss=0.07629, over 17166.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01065, ecapa_loss=0.0001429, whisper_loss=0.0899, over 3877620.14 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:28:35,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4230570.0, ans=0.125 2024-08-19 02:28:43,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4230670.0, ans=0.125 2024-08-19 02:28:47,181 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 02:28:48,776 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:28:58,537 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2024-08-19 02:29:06,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4230870.0, ans=0.0 2024-08-19 02:29:06,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4230870.0, ans=0.125 2024-08-19 02:29:25,837 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 02:29:27,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4230970.0, ans=0.125 2024-08-19 02:29:33,749 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7500, loss[loss=0.1077, beats_loss=0.009271, ecapa_loss=0.0001564, whisper_loss=0.09688, over 23069.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01057, ecapa_loss=0.0001427, whisper_loss=0.09002, over 3878754.38 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:29:43,752 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 02:29:54,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.202e+01 2.431e+01 2.767e+01 3.373e+01, threshold=4.863e+01, percent-clipped=0.0 2024-08-19 02:30:12,356 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 02:30:19,000 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 02:30:26,242 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 02:30:26,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4231470.0, ans=0.125 2024-08-19 02:30:32,622 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 26 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 02:30:37,881 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7550, loss[loss=0.1257, beats_loss=0.006759, ecapa_loss=0.0001772, whisper_loss=0.1172, over 16284.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01058, ecapa_loss=0.0001423, whisper_loss=0.08983, over 3864868.32 frames. ], batch size: 64, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:30:44,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4231570.0, ans=0.035 2024-08-19 02:30:44,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4231570.0, ans=0.1 2024-08-19 02:30:50,430 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 02:30:53,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4231670.0, ans=0.1 2024-08-19 02:30:55,843 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-19 02:30:56,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4231670.0, ans=0.2 2024-08-19 02:31:03,423 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 02:31:31,251 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 02:31:32,757 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4231970.0, ans=0.125 2024-08-19 02:31:36,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4231970.0, ans=0.125 2024-08-19 02:31:38,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4231970.0, ans=0.0 2024-08-19 02:31:41,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7600, loss[loss=0.1114, beats_loss=0.009448, ecapa_loss=0.0001338, whisper_loss=0.1006, over 23965.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.08956, over 3828817.84 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:31:45,470 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 02:31:56,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4232170.0, ans=0.1 2024-08-19 02:32:01,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.345e+01 2.567e+01 2.795e+01 4.774e+01, threshold=5.135e+01, percent-clipped=0.0 2024-08-19 02:32:11,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=15.0 2024-08-19 02:32:14,851 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 02:32:21,615 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.650e+00 2024-08-19 02:32:21,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4232370.0, ans=0.125 2024-08-19 02:32:22,993 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.02 vs. limit=10.0 2024-08-19 02:32:27,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4232370.0, ans=0.125 2024-08-19 02:32:27,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4232370.0, ans=0.07 2024-08-19 02:32:29,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4232370.0, ans=0.04949747468305833 2024-08-19 02:32:31,591 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 02:32:34,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4232470.0, ans=0.0 2024-08-19 02:32:41,598 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 02:32:41,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4232470.0, ans=0.125 2024-08-19 02:32:44,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4232570.0, ans=0.0 2024-08-19 02:32:45,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7650, loss[loss=0.09869, beats_loss=0.01285, ecapa_loss=0.0001226, whisper_loss=0.08462, over 22892.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001426, whisper_loss=0.08987, over 3847361.36 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:32:47,643 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 02:32:59,382 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 02:33:05,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4232670.0, ans=0.0 2024-08-19 02:33:20,863 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=8.0 2024-08-19 02:33:29,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4232870.0, ans=0.035 2024-08-19 02:33:38,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4232970.0, ans=0.0 2024-08-19 02:33:39,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4232970.0, ans=0.125 2024-08-19 02:33:40,605 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 02:33:41,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2024-08-19 02:33:41,701 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 17 from LS+wenet, 26 from Vox, 44 fro AS 2024-08-19 02:33:48,023 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7700, loss[loss=0.1015, beats_loss=0.01032, ecapa_loss=0.0001508, whisper_loss=0.08968, over 21569.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.08962, over 3855968.93 frames. ], batch size: 87, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:34:01,963 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 02:34:07,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.265e+01 2.506e+01 2.925e+01 4.294e+01, threshold=5.013e+01, percent-clipped=0.0 2024-08-19 02:34:10,030 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.35 vs. limit=22.5 2024-08-19 02:34:14,716 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 02:34:17,269 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 02:34:18,429 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 02:34:18,780 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4233270.0, ans=0.1 2024-08-19 02:34:22,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4233270.0, ans=0.2 2024-08-19 02:34:26,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4233370.0, ans=0.125 2024-08-19 02:34:34,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4233370.0, ans=0.0 2024-08-19 02:34:51,661 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7750, loss[loss=0.08312, beats_loss=0.01311, ecapa_loss=0.0001093, whisper_loss=0.06892, over 15291.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001436, whisper_loss=0.08986, over 3845672.25 frames. ], batch size: 61, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:35:00,369 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 02:35:05,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4233670.0, ans=0.125 2024-08-19 02:35:24,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4233770.0, ans=0.125 2024-08-19 02:35:26,103 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.10 vs. limit=10.0 2024-08-19 02:35:31,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2024-08-19 02:35:32,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4233870.0, ans=0.2 2024-08-19 02:35:39,575 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4233870.0, ans=0.125 2024-08-19 02:35:45,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4233970.0, ans=0.125 2024-08-19 02:35:47,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.37 vs. limit=15.0 2024-08-19 02:35:51,163 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 02:35:54,852 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7800, loss[loss=0.09765, beats_loss=0.01108, ecapa_loss=0.0001609, whisper_loss=0.08496, over 22094.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001434, whisper_loss=0.08998, over 3860224.71 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:36:03,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4234070.0, ans=0.125 2024-08-19 02:36:15,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.304e+01 2.560e+01 2.905e+01 1.988e+02, threshold=5.119e+01, percent-clipped=2.0 2024-08-19 02:36:22,808 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 02:36:24,114 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 02:36:31,858 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 02:36:40,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4234370.0, ans=0.07 2024-08-19 02:36:45,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4234470.0, ans=0.0 2024-08-19 02:36:49,443 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4234470.0, ans=0.1 2024-08-19 02:36:57,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4234570.0, ans=0.125 2024-08-19 02:36:57,828 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7850, loss[loss=0.09537, beats_loss=0.01015, ecapa_loss=0.0001456, whisper_loss=0.08377, over 15045.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01046, ecapa_loss=0.0001428, whisper_loss=0.08995, over 3850533.69 frames. ], batch size: 60, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:37:20,413 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 02:37:20,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4234670.0, ans=0.0 2024-08-19 02:37:30,735 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 02:37:36,033 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4234870.0, ans=0.0 2024-08-19 02:37:40,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.79 vs. limit=15.0 2024-08-19 02:37:49,921 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 21 from LS+wenet, 26 from Vox, 45 fro AS 2024-08-19 02:37:55,141 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 02:38:01,013 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7900, loss[loss=0.08646, beats_loss=0.01302, ecapa_loss=0.0001263, whisper_loss=0.07218, over 22443.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01052, ecapa_loss=0.0001435, whisper_loss=0.09039, over 3865595.26 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:38:04,038 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 02:38:07,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4235070.0, ans=0.125 2024-08-19 02:38:20,847 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.393e+01 2.663e+01 2.999e+01 6.865e+01, threshold=5.327e+01, percent-clipped=3.0 2024-08-19 02:38:21,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4235170.0, ans=0.1 2024-08-19 02:38:45,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4235370.0, ans=0.2 2024-08-19 02:38:46,071 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 02:38:55,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.15 vs. limit=15.0 2024-08-19 02:39:02,321 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 02:39:03,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 7950, loss[loss=0.09216, beats_loss=0.009947, ecapa_loss=0.0001472, whisper_loss=0.08074, over 15879.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01051, ecapa_loss=0.0001433, whisper_loss=0.0908, over 3853101.38 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:39:03,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4235570.0, ans=0.0 2024-08-19 02:39:04,758 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 02:39:12,130 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 02:39:29,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4235770.0, ans=0.125 2024-08-19 02:39:37,742 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 02:39:56,150 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 02:40:03,517 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 19 from LS+wenet, 27 from Vox, 50 fro AS 2024-08-19 02:40:04,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8000, loss[loss=0.07955, beats_loss=0.01379, ecapa_loss=0.0001327, whisper_loss=0.06444, over 22652.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001422, whisper_loss=0.09042, over 3866097.81 frames. ], batch size: 96, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:40:24,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.236e+01 2.508e+01 2.783e+01 4.268e+01, threshold=5.017e+01, percent-clipped=0.0 2024-08-19 02:40:24,428 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 19 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-19 02:40:24,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4236170.0, ans=0.0 2024-08-19 02:40:33,737 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=11.23 vs. limit=12.0 2024-08-19 02:40:35,932 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 02:40:46,845 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 02:40:49,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.46 vs. limit=22.5 2024-08-19 02:41:05,892 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8050, loss[loss=0.07482, beats_loss=0.00887, ecapa_loss=0.0001279, whisper_loss=0.06467, over 15768.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01061, ecapa_loss=0.0001414, whisper_loss=0.09004, over 3899444.69 frames. ], batch size: 59, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:41:06,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4236570.0, ans=0.2 2024-08-19 02:41:13,546 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 02:41:14,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=22.5 2024-08-19 02:41:15,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.25 vs. limit=22.5 2024-08-19 02:41:28,081 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 02:41:34,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4236770.0, ans=0.015 2024-08-19 02:41:42,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4236870.0, ans=0.0 2024-08-19 02:41:52,251 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 02:41:53,585 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4236870.0, ans=0.1 2024-08-19 02:41:58,193 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-19 02:41:59,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4236970.0, ans=0.0 2024-08-19 02:42:03,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4236970.0, ans=0.035 2024-08-19 02:42:06,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4237070.0, ans=0.1 2024-08-19 02:42:07,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8100, loss[loss=0.111, beats_loss=0.009119, ecapa_loss=0.0001621, whisper_loss=0.1002, over 22193.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.09031, over 3889734.26 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:42:15,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=12.0 2024-08-19 02:42:27,406 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.244e+01 2.525e+01 2.786e+01 3.995e+01, threshold=5.049e+01, percent-clipped=0.0 2024-08-19 02:42:30,034 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 02:42:33,455 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 02:42:39,332 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 02:42:40,532 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 02:42:54,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4237370.0, ans=0.1 2024-08-19 02:43:03,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4237470.0, ans=0.1 2024-08-19 02:43:04,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4237470.0, ans=0.125 2024-08-19 02:43:06,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4237470.0, ans=0.2 2024-08-19 02:43:08,504 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8150, loss[loss=0.09382, beats_loss=0.01212, ecapa_loss=0.0001451, whisper_loss=0.08025, over 19447.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001426, whisper_loss=0.08973, over 3866094.47 frames. ], batch size: 79, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:43:17,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4237570.0, ans=0.0 2024-08-19 02:43:25,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4237670.0, ans=0.0 2024-08-19 02:43:33,277 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2024-08-19 02:43:36,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4237770.0, ans=0.05 2024-08-19 02:43:43,826 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 02:43:56,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4237970.0, ans=0.125 2024-08-19 02:44:03,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4237970.0, ans=0.125 2024-08-19 02:44:09,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8200, loss[loss=0.108, beats_loss=0.009192, ecapa_loss=0.0001435, whisper_loss=0.09737, over 15968.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001436, whisper_loss=0.09016, over 3873334.65 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:44:09,712 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 02:44:19,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4238070.0, ans=0.1 2024-08-19 02:44:28,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.281e+01 2.470e+01 2.775e+01 3.796e+01, threshold=4.940e+01, percent-clipped=0.0 2024-08-19 02:44:35,056 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 28 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 02:44:45,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4238370.0, ans=0.0 2024-08-19 02:44:46,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4238370.0, ans=0.125 2024-08-19 02:44:54,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4238370.0, ans=0.125 2024-08-19 02:44:58,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4238470.0, ans=0.0 2024-08-19 02:45:10,506 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8250, loss[loss=0.1083, beats_loss=0.009916, ecapa_loss=0.0001665, whisper_loss=0.0967, over 21944.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001436, whisper_loss=0.09062, over 3884833.17 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:45:21,164 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.419e-01 2024-08-19 02:45:32,387 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4238670.0, ans=0.0 2024-08-19 02:45:56,880 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 02:46:12,820 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8300, loss[loss=0.1135, beats_loss=0.009444, ecapa_loss=0.000124, whisper_loss=0.1028, over 17181.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01051, ecapa_loss=0.0001425, whisper_loss=0.09065, over 3888480.46 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:46:15,516 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 22 from LS+wenet, 23 from Vox, 47 fro AS 2024-08-19 02:46:18,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-08-19 02:46:25,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4239170.0, ans=0.125 2024-08-19 02:46:25,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4239170.0, ans=0.125 2024-08-19 02:46:25,920 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-19 02:46:29,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4239170.0, ans=0.1 2024-08-19 02:46:32,637 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.960e+01 2.439e+01 2.594e+01 2.906e+01 5.754e+01, threshold=5.188e+01, percent-clipped=1.0 2024-08-19 02:46:41,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4239270.0, ans=0.09899494936611666 2024-08-19 02:46:48,727 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 02:46:48,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4239370.0, ans=0.0 2024-08-19 02:46:54,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4239370.0, ans=0.07 2024-08-19 02:46:59,781 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 22 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-19 02:47:00,856 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 02:47:12,976 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 02:47:13,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8350, loss[loss=0.1099, beats_loss=0.009468, ecapa_loss=0.0001222, whisper_loss=0.09917, over 24303.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01055, ecapa_loss=0.0001423, whisper_loss=0.09044, over 3881537.18 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:47:20,053 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 02:47:21,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4239570.0, ans=0.125 2024-08-19 02:47:55,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4239870.0, ans=0.125 2024-08-19 02:47:56,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4239870.0, ans=0.125 2024-08-19 02:48:02,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4239970.0, ans=0.2 2024-08-19 02:48:17,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8400, loss[loss=0.0861, beats_loss=0.008443, ecapa_loss=0.0001448, whisper_loss=0.07621, over 16465.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.000142, whisper_loss=0.09059, over 3863224.86 frames. ], batch size: 64, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:48:33,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4240170.0, ans=0.2 2024-08-19 02:48:36,823 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.697e+01 2.296e+01 2.515e+01 2.691e+01 3.893e+01, threshold=5.031e+01, percent-clipped=0.0 2024-08-19 02:48:40,434 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 15 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 02:48:55,363 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 8 from Vox, 32 fro AS 2024-08-19 02:48:58,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=15.0 2024-08-19 02:49:06,953 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.39 vs. limit=10.0 2024-08-19 02:49:15,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4240470.0, ans=0.1 2024-08-19 02:49:18,312 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8450, loss[loss=0.09008, beats_loss=0.01265, ecapa_loss=0.0001039, whisper_loss=0.07638, over 20019.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001424, whisper_loss=0.0904, over 3849211.95 frames. ], batch size: 80, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:49:27,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4240570.0, ans=0.125 2024-08-19 02:49:32,516 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:49:39,020 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 02:49:44,051 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 02:49:46,588 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 02:49:50,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4240770.0, ans=0.125 2024-08-19 02:49:50,775 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2024-08-19 02:49:57,343 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-19 02:50:07,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4240970.0, ans=0.125 2024-08-19 02:50:18,819 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8500, loss[loss=0.1328, beats_loss=0.006954, ecapa_loss=0.000162, whisper_loss=0.1242, over 17842.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01048, ecapa_loss=0.0001426, whisper_loss=0.09028, over 3860596.82 frames. ], batch size: 70, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:50:24,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4241070.0, ans=0.125 2024-08-19 02:50:26,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4241070.0, ans=0.1 2024-08-19 02:50:35,723 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 02:50:36,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4241170.0, ans=0.0 2024-08-19 02:50:37,938 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.287e+01 2.464e+01 2.780e+01 3.780e+01, threshold=4.928e+01, percent-clipped=0.0 2024-08-19 02:50:49,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4241270.0, ans=0.0 2024-08-19 02:51:03,008 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4241370.0, ans=0.125 2024-08-19 02:51:18,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4241570.0, ans=0.09899494936611666 2024-08-19 02:51:19,389 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8550, loss[loss=0.09897, beats_loss=0.01225, ecapa_loss=0.0001225, whisper_loss=0.08549, over 21697.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01051, ecapa_loss=0.0001424, whisper_loss=0.08981, over 3884177.20 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:51:23,337 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 02:51:34,761 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4241670.0, ans=0.0 2024-08-19 02:51:55,537 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 11 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 02:52:21,324 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8600, loss[loss=0.09426, beats_loss=0.01011, ecapa_loss=0.0001346, whisper_loss=0.0828, over 14814.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001421, whisper_loss=0.09026, over 3812840.73 frames. ], batch size: 57, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:52:42,036 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.302e+01 2.568e+01 2.849e+01 4.123e+01, threshold=5.135e+01, percent-clipped=0.0 2024-08-19 02:52:49,395 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4242270.0, ans=0.125 2024-08-19 02:52:53,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4242270.0, ans=0.0 2024-08-19 02:53:00,754 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 18 from LS+wenet, 25 from Vox, 45 fro AS 2024-08-19 02:53:03,252 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 29 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 02:53:08,811 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 02:53:16,065 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 02:53:17,446 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 02:53:22,025 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 02:53:28,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4242470.0, ans=0.0 2024-08-19 02:53:30,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8650, loss[loss=0.08601, beats_loss=0.01218, ecapa_loss=0.0001876, whisper_loss=0.07196, over 14953.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01051, ecapa_loss=0.0001417, whisper_loss=0.08962, over 3814477.47 frames. ], batch size: 62, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:53:34,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4242570.0, ans=0.0 2024-08-19 02:53:35,928 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 29 from Vox, 34 fro AS 2024-08-19 02:53:43,024 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 02:53:44,164 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 02:53:45,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4242670.0, ans=0.2 2024-08-19 02:53:50,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=4242670.0, ans=0.2 2024-08-19 02:53:57,422 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 02:54:10,637 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4242770.0, ans=0.125 2024-08-19 02:54:10,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-19 02:54:16,921 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4242870.0, ans=0.0 2024-08-19 02:54:16,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4242870.0, ans=0.1 2024-08-19 02:54:23,392 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 02:54:25,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4242870.0, ans=0.125 2024-08-19 02:54:26,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4242870.0, ans=0.0 2024-08-19 02:54:41,464 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=4.84 vs. limit=10.0 2024-08-19 02:54:43,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4243070.0, ans=0.0 2024-08-19 02:54:44,079 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2024-08-19 02:54:44,488 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8700, loss[loss=0.08096, beats_loss=0.01276, ecapa_loss=0.0001107, whisper_loss=0.06709, over 22480.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001414, whisper_loss=0.09014, over 3862113.36 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:54:45,874 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 02:54:52,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4243070.0, ans=0.0 2024-08-19 02:55:04,111 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.949e+01 2.365e+01 2.539e+01 2.840e+01 3.770e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-19 02:55:04,295 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 02:55:07,074 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 02:55:26,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4243370.0, ans=0.125 2024-08-19 02:55:31,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4243370.0, ans=10.0 2024-08-19 02:55:38,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2024-08-19 02:55:45,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8750, loss[loss=0.09543, beats_loss=0.008908, ecapa_loss=0.0001626, whisper_loss=0.0849, over 20907.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001413, whisper_loss=0.09017, over 3811895.18 frames. ], batch size: 81, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:56:11,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4243770.0, ans=0.125 2024-08-19 02:56:17,484 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-19 02:56:31,621 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 02:56:36,519 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 02:56:46,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8800, loss[loss=0.08003, beats_loss=0.01442, ecapa_loss=0.0001138, whisper_loss=0.06447, over 21657.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001411, whisper_loss=0.09026, over 3866492.93 frames. ], batch size: 90, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:56:57,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.99 vs. limit=22.5 2024-08-19 02:57:05,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4244170.0, ans=0.0 2024-08-19 02:57:05,910 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.929e+01 2.300e+01 2.475e+01 2.796e+01 3.899e+01, threshold=4.950e+01, percent-clipped=0.0 2024-08-19 02:57:19,281 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 02:57:20,640 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 02:57:35,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4244470.0, ans=0.1 2024-08-19 02:57:36,692 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 02:57:47,823 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8850, loss[loss=0.116, beats_loss=0.008034, ecapa_loss=0.000162, whisper_loss=0.1063, over 23246.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.09043, over 3877999.70 frames. ], batch size: 93, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:57:48,215 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4244570.0, ans=0.125 2024-08-19 02:57:49,233 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 02:58:05,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4244670.0, ans=0.1 2024-08-19 02:58:23,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4244870.0, ans=0.0 2024-08-19 02:58:28,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4244870.0, ans=0.1 2024-08-19 02:58:44,710 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 02:58:49,455 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8900, loss[loss=0.1179, beats_loss=0.01083, ecapa_loss=0.0001356, whisper_loss=0.1057, over 22387.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01049, ecapa_loss=0.0001404, whisper_loss=0.0911, over 3885693.32 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:58:49,563 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 02:58:49,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4245070.0, ans=0.0 2024-08-19 02:59:04,393 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 02:59:08,991 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.326e+01 2.543e+01 2.750e+01 4.033e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-19 02:59:13,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4245270.0, ans=0.2 2024-08-19 02:59:19,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4245270.0, ans=0.125 2024-08-19 02:59:30,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4245370.0, ans=0.2 2024-08-19 02:59:49,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4245470.0, ans=0.125 2024-08-19 02:59:51,580 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 8950, loss[loss=0.1199, beats_loss=0.009903, ecapa_loss=0.0001152, whisper_loss=0.1089, over 18541.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01045, ecapa_loss=0.0001406, whisper_loss=0.09118, over 3893331.69 frames. ], batch size: 66, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 02:59:59,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4245570.0, ans=0.125 2024-08-19 03:00:04,677 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-08-19 03:00:06,562 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 03:00:20,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4245770.0, ans=0.125 2024-08-19 03:00:31,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4245870.0, ans=0.0 2024-08-19 03:00:45,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=4245970.0, ans=15.0 2024-08-19 03:00:46,621 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 03:00:46,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4245970.0, ans=0.1 2024-08-19 03:00:53,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4246070.0, ans=0.0 2024-08-19 03:00:53,896 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9000, loss[loss=0.09057, beats_loss=0.01074, ecapa_loss=0.0001801, whisper_loss=0.07803, over 20384.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001422, whisper_loss=0.09064, over 3872397.33 frames. ], batch size: 89, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 03:00:53,897 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 03:01:30,338 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on ASR_libri: loss=0.2527, beats_loss=0, ecapa_loss=0.0005203, whisper_loss=0.2475, over 922467.00 frames. 2024-08-19 03:01:46,067 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on SV_voxceleb1: loss=0.004041, beats_loss=0, ecapa_loss=0.0004041, whisper_loss=0, over 939242.00 frames. 2024-08-19 03:03:34,148 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on AT_audioset: loss=0.02316, beats_loss=0.02316, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 03:03:34,151 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 03:03:53,424 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.336e+01 2.586e+01 2.879e+01 3.784e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-19 03:04:01,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4246270.0, ans=0.1 2024-08-19 03:04:08,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4246270.0, ans=0.2 2024-08-19 03:04:18,755 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4246370.0, ans=0.125 2024-08-19 03:04:27,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4246470.0, ans=0.2 2024-08-19 03:04:30,144 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.31 vs. limit=6.0 2024-08-19 03:04:34,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4246570.0, ans=0.2 2024-08-19 03:04:35,622 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9050, loss[loss=0.1184, beats_loss=0.008723, ecapa_loss=0.0001433, whisper_loss=0.1082, over 20306.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.0104, ecapa_loss=0.0001426, whisper_loss=0.09139, over 3883159.75 frames. ], batch size: 78, lr: 2.11e-03, grad_scale: 5.764607523034235e+17 2024-08-19 03:04:51,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4246670.0, ans=0.125 2024-08-19 03:04:55,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=4246670.0, ans=0.02 2024-08-19 03:05:00,688 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 03:05:15,294 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 03:05:16,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4246870.0, ans=0.025 2024-08-19 03:05:29,026 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 03:05:37,366 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9100, loss[loss=0.09667, beats_loss=0.01051, ecapa_loss=0.0001559, whisper_loss=0.08461, over 21771.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001422, whisper_loss=0.09094, over 3895954.85 frames. ], batch size: 94, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:05:58,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.909e+01 2.412e+01 2.655e+01 2.932e+01 4.507e+01, threshold=5.309e+01, percent-clipped=0.0 2024-08-19 03:06:04,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4247270.0, ans=0.125 2024-08-19 03:06:21,853 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4247370.0, ans=0.2 2024-08-19 03:06:24,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4247370.0, ans=0.1 2024-08-19 03:06:37,988 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-19 03:06:38,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9150, loss[loss=0.1131, beats_loss=0.009398, ecapa_loss=0.0001703, whisper_loss=0.102, over 21027.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001424, whisper_loss=0.0907, over 3912617.18 frames. ], batch size: 88, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:06:44,060 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-08-19 03:06:57,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4247670.0, ans=0.0 2024-08-19 03:07:04,654 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.94 vs. limit=15.0 2024-08-19 03:07:09,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4247770.0, ans=0.0 2024-08-19 03:07:12,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4247770.0, ans=0.125 2024-08-19 03:07:21,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.97 vs. limit=10.0 2024-08-19 03:07:21,978 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 03:07:42,276 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9200, loss[loss=0.1093, beats_loss=0.01077, ecapa_loss=0.000126, whisper_loss=0.09724, over 17047.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.000143, whisper_loss=0.09024, over 3924944.89 frames. ], batch size: 68, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:07:50,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4248070.0, ans=0.1 2024-08-19 03:07:51,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4248070.0, ans=0.1 2024-08-19 03:08:04,114 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.830e+01 2.284e+01 2.554e+01 2.841e+01 4.515e+02, threshold=5.108e+01, percent-clipped=1.0 2024-08-19 03:08:07,870 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=22.5 2024-08-19 03:08:21,103 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 03:08:36,307 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 03:08:39,369 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 40 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 03:08:40,534 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 03:08:42,707 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2024-08-19 03:08:47,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9250, loss[loss=0.1062, beats_loss=0.01076, ecapa_loss=0.0001472, whisper_loss=0.09396, over 21604.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01046, ecapa_loss=0.0001442, whisper_loss=0.09062, over 3920671.65 frames. ], batch size: 91, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:08:48,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4248570.0, ans=0.0 2024-08-19 03:08:50,763 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 03:08:57,337 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 03:08:58,463 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 18 from LS+wenet, 29 from Vox, 42 fro AS 2024-08-19 03:09:26,352 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4248870.0, ans=0.2 2024-08-19 03:09:42,162 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-08-19 03:09:54,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9300, loss[loss=0.1116, beats_loss=0.008865, ecapa_loss=0.0001746, whisper_loss=0.101, over 17061.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01048, ecapa_loss=0.0001436, whisper_loss=0.09046, over 3909304.05 frames. ], batch size: 72, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:10:02,158 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 03:10:04,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4249070.0, ans=0.125 2024-08-19 03:10:15,659 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4249170.0, ans=0.2 2024-08-19 03:10:16,510 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-19 03:10:17,512 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.789e+01 2.468e+01 2.687e+01 3.069e+01 9.721e+01, threshold=5.373e+01, percent-clipped=1.0 2024-08-19 03:10:22,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4249270.0, ans=10.0 2024-08-19 03:10:22,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=4249270.0, ans=0.02 2024-08-19 03:10:34,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4249370.0, ans=0.0 2024-08-19 03:10:38,918 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 03:10:44,844 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4249370.0, ans=0.0 2024-08-19 03:10:51,277 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 03:10:52,855 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:10:56,283 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 21 from Vox, 23 fro AS 2024-08-19 03:10:57,859 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 30 from LS+wenet, 28 from Vox, 25 fro AS 2024-08-19 03:11:00,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4249570.0, ans=0.0 2024-08-19 03:11:01,530 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9350, loss[loss=0.102, beats_loss=0.008573, ecapa_loss=0.0001605, whisper_loss=0.09178, over 23175.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001436, whisper_loss=0.0907, over 3894838.27 frames. ], batch size: 92, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:11:06,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4249570.0, ans=0.125 2024-08-19 03:11:15,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4249670.0, ans=10.0 2024-08-19 03:11:15,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4249670.0, ans=0.0 2024-08-19 03:11:24,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4249670.0, ans=0.0 2024-08-19 03:11:35,641 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 03:11:38,269 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 03:11:41,687 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4249870.0, ans=0.0 2024-08-19 03:11:45,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4249870.0, ans=0.0 2024-08-19 03:11:46,182 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2024-08-19 03:11:56,227 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 16 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 03:11:58,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-19 03:12:04,492 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 10 from Vox, 30 fro AS 2024-08-19 03:12:08,086 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9400, loss[loss=0.1035, beats_loss=0.009556, ecapa_loss=0.0001264, whisper_loss=0.09264, over 20979.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.000143, whisper_loss=0.08952, over 3870097.15 frames. ], batch size: 82, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:12:17,436 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 03:12:21,556 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 03:12:32,929 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.288e+01 2.511e+01 2.753e+01 4.434e+01, threshold=5.022e+01, percent-clipped=0.0 2024-08-19 03:12:58,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4250370.0, ans=0.1 2024-08-19 03:13:06,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4250470.0, ans=0.0 2024-08-19 03:13:08,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.32 vs. limit=15.0 2024-08-19 03:13:17,893 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9450, loss[loss=0.0831, beats_loss=0.01374, ecapa_loss=0.0001373, whisper_loss=0.06799, over 20175.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001435, whisper_loss=0.08966, over 3894733.71 frames. ], batch size: 86, lr: 2.11e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:13:20,495 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 12 from Vox, 27 fro AS 2024-08-19 03:13:20,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4250570.0, ans=0.125 2024-08-19 03:13:30,109 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 03:13:32,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4250670.0, ans=0.125 2024-08-19 03:13:35,663 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 03:13:39,044 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4250670.0, ans=0.125 2024-08-19 03:13:43,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4250670.0, ans=0.1 2024-08-19 03:13:48,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-19 03:13:50,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4250770.0, ans=0.0 2024-08-19 03:14:07,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4250870.0, ans=0.1 2024-08-19 03:14:08,337 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 03:14:08,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2024-08-19 03:14:10,612 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.22 vs. limit=15.0 2024-08-19 03:14:12,589 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4250970.0, ans=0.125 2024-08-19 03:14:23,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4250970.0, ans=15.0 2024-08-19 03:14:25,065 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2024-08-19 03:14:26,772 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9500, loss[loss=0.07917, beats_loss=0.0118, ecapa_loss=0.0001199, whisper_loss=0.06617, over 17111.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01054, ecapa_loss=0.0001432, whisper_loss=0.08917, over 3879221.18 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:14:37,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4251070.0, ans=0.2 2024-08-19 03:14:37,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.79 vs. limit=10.0 2024-08-19 03:14:42,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4251170.0, ans=0.125 2024-08-19 03:14:46,307 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 19 from Vox, 48 fro AS 2024-08-19 03:14:50,646 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.262e+01 2.501e+01 2.810e+01 4.302e+01, threshold=5.003e+01, percent-clipped=0.0 2024-08-19 03:14:54,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2024-08-19 03:14:58,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4251270.0, ans=0.125 2024-08-19 03:15:04,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4251270.0, ans=0.125 2024-08-19 03:15:15,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4251370.0, ans=0.125 2024-08-19 03:15:19,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4251370.0, ans=0.0 2024-08-19 03:15:36,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9550, loss[loss=0.1204, beats_loss=0.008034, ecapa_loss=0.0001468, whisper_loss=0.1109, over 17768.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01047, ecapa_loss=0.0001431, whisper_loss=0.08986, over 3857560.80 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:15:37,807 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 03:16:25,563 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 28 from Vox, 21 fro AS 2024-08-19 03:16:44,663 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9600, loss[loss=0.1056, beats_loss=0.009491, ecapa_loss=0.000136, whisper_loss=0.09475, over 19871.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001435, whisper_loss=0.08991, over 3850232.74 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:16:47,449 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 03:16:48,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=22.5 2024-08-19 03:16:50,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4252070.0, ans=0.125 2024-08-19 03:17:04,455 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-19 03:17:08,141 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.320e+01 2.551e+01 2.874e+01 5.589e+01, threshold=5.101e+01, percent-clipped=1.0 2024-08-19 03:17:26,253 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 30 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 03:17:37,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4252370.0, ans=0.2 2024-08-19 03:17:41,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4252470.0, ans=0.125 2024-08-19 03:17:43,800 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 35 from Vox, 29 fro AS 2024-08-19 03:17:44,268 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.75 vs. limit=22.5 2024-08-19 03:17:52,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4252570.0, ans=0.1 2024-08-19 03:17:52,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9650, loss[loss=0.1178, beats_loss=0.01023, ecapa_loss=0.0001427, whisper_loss=0.1062, over 22400.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01029, ecapa_loss=0.0001447, whisper_loss=0.09094, over 3862141.88 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:17:56,869 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2024-08-19 03:17:57,451 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 03:18:15,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4252670.0, ans=0.125 2024-08-19 03:18:57,182 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4252970.0, ans=0.125 2024-08-19 03:18:59,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4252970.0, ans=0.125 2024-08-19 03:19:02,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9700, loss[loss=0.09673, beats_loss=0.00911, ecapa_loss=0.000168, whisper_loss=0.08594, over 17167.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01031, ecapa_loss=0.0001454, whisper_loss=0.09082, over 3859691.78 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:19:24,449 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.350e+01 2.550e+01 2.854e+01 4.797e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-19 03:19:28,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4253270.0, ans=0.125 2024-08-19 03:19:30,565 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 23 from Vox, 25 fro AS 2024-08-19 03:19:32,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4253270.0, ans=0.0 2024-08-19 03:19:33,875 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4253270.0, ans=0.125 2024-08-19 03:19:34,824 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 27 from LS+wenet, 9 from Vox, 37 fro AS 2024-08-19 03:19:41,896 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4253370.0, ans=0.125 2024-08-19 03:19:43,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4253370.0, ans=0.125 2024-08-19 03:20:08,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4253570.0, ans=0.0 2024-08-19 03:20:09,347 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9750, loss[loss=0.07065, beats_loss=0.01056, ecapa_loss=0.0001596, whisper_loss=0.05849, over 12799.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001442, whisper_loss=0.08982, over 3844541.76 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:20:14,805 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 19 from Vox, 21 fro AS 2024-08-19 03:20:16,856 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2024-08-19 03:20:40,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4253770.0, ans=0.0 2024-08-19 03:20:43,329 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4253770.0, ans=0.04949747468305833 2024-08-19 03:20:47,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4253770.0, ans=0.125 2024-08-19 03:21:05,418 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2024-08-19 03:21:16,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9800, loss[loss=0.1296, beats_loss=0.00869, ecapa_loss=0.0001184, whisper_loss=0.1197, over 20862.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001434, whisper_loss=0.08955, over 3858376.84 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:21:24,928 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=22.5 2024-08-19 03:21:32,728 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 03:21:33,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-08-19 03:21:36,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4254170.0, ans=0.125 2024-08-19 03:21:40,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4254170.0, ans=0.1 2024-08-19 03:21:40,856 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.889e+01 2.265e+01 2.575e+01 2.940e+01 5.043e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-19 03:21:42,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4254170.0, ans=0.125 2024-08-19 03:21:46,531 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=5.188e-03 2024-08-19 03:21:49,017 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 03:21:53,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4254270.0, ans=0.035 2024-08-19 03:22:05,396 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 16 from Vox, 41 fro AS 2024-08-19 03:22:06,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4254370.0, ans=0.0 2024-08-19 03:22:13,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4254470.0, ans=0.125 2024-08-19 03:22:14,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4254470.0, ans=0.1 2024-08-19 03:22:26,177 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 03:22:27,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9850, loss[loss=0.1043, beats_loss=0.01067, ecapa_loss=0.0001289, whisper_loss=0.09237, over 17352.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.08996, over 3880939.39 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:22:34,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4254570.0, ans=0.125 2024-08-19 03:22:36,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4254570.0, ans=0.1 2024-08-19 03:22:46,727 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 03:22:56,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4254770.0, ans=0.125 2024-08-19 03:23:12,854 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 12 from Vox, 34 fro AS 2024-08-19 03:23:14,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4254870.0, ans=0.125 2024-08-19 03:23:38,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9900, loss[loss=0.1038, beats_loss=0.01039, ecapa_loss=0.0001299, whisper_loss=0.0921, over 20302.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.000142, whisper_loss=0.09017, over 3883616.94 frames. ], batch size: 80, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:23:38,311 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 03:23:49,759 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 03:23:51,258 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 03:23:52,968 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4255170.0, ans=0.0 2024-08-19 03:24:01,890 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.258e+01 2.526e+01 2.831e+01 1.628e+02, threshold=5.053e+01, percent-clipped=0.0 2024-08-19 03:24:05,587 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 03:24:15,168 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 03:24:18,407 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=12.0 2024-08-19 03:24:23,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4255370.0, ans=0.125 2024-08-19 03:24:27,237 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 23 from Vox, 44 fro AS 2024-08-19 03:24:33,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4255470.0, ans=0.125 2024-08-19 03:24:41,642 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2024-08-19 03:24:48,438 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4255570.0, ans=0.125 2024-08-19 03:24:49,068 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 9950, loss[loss=0.1053, beats_loss=0.01223, ecapa_loss=0.0001287, whisper_loss=0.09173, over 22324.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001428, whisper_loss=0.08987, over 3874826.72 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:24:53,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4255570.0, ans=0.0 2024-08-19 03:25:00,603 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4255570.0, ans=0.0 2024-08-19 03:25:08,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4255670.0, ans=0.0 2024-08-19 03:25:42,727 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 03:25:55,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4255970.0, ans=0.125 2024-08-19 03:26:01,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4256070.0, ans=0.0 2024-08-19 03:26:02,740 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10000, loss[loss=0.1062, beats_loss=0.007302, ecapa_loss=0.0001577, whisper_loss=0.0973, over 16035.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01049, ecapa_loss=0.0001431, whisper_loss=0.09019, over 3856249.96 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:26:06,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4256070.0, ans=0.2 2024-08-19 03:26:07,968 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-08-19 03:26:20,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4256170.0, ans=0.2 2024-08-19 03:26:21,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4256170.0, ans=0.2 2024-08-19 03:26:30,547 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.221e+01 2.512e+01 2.759e+01 2.738e+02, threshold=5.023e+01, percent-clipped=3.0 2024-08-19 03:26:30,678 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 03:26:31,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4256170.0, ans=0.0 2024-08-19 03:26:34,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4256270.0, ans=0.125 2024-08-19 03:27:09,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4256470.0, ans=0.1 2024-08-19 03:27:19,989 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10050, loss[loss=0.0939, beats_loss=0.01073, ecapa_loss=0.0001203, whisper_loss=0.08197, over 15731.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001437, whisper_loss=0.09032, over 3842744.70 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:27:26,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4256570.0, ans=0.0 2024-08-19 03:27:38,979 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-19 03:27:40,585 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-08-19 03:27:50,925 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:27:53,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4256770.0, ans=0.0 2024-08-19 03:28:10,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4256870.0, ans=0.125 2024-08-19 03:28:14,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4256870.0, ans=10.0 2024-08-19 03:28:27,679 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 03:28:36,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10100, loss[loss=0.103, beats_loss=0.01199, ecapa_loss=0.0001068, whisper_loss=0.08991, over 23551.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001437, whisper_loss=0.09027, over 3874762.15 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:28:41,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4257070.0, ans=0.125 2024-08-19 03:28:44,131 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4257070.0, ans=0.1 2024-08-19 03:29:02,277 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.774e+01 2.419e+01 2.709e+01 3.030e+01 4.080e+01, threshold=5.418e+01, percent-clipped=0.0 2024-08-19 03:29:25,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4257370.0, ans=0.0 2024-08-19 03:29:27,288 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4257370.0, ans=0.125 2024-08-19 03:29:36,837 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.85 vs. limit=15.0 2024-08-19 03:29:37,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4257370.0, ans=0.1 2024-08-19 03:29:39,751 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4257470.0, ans=0.125 2024-08-19 03:29:52,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4257470.0, ans=0.0 2024-08-19 03:29:56,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10150, loss[loss=0.1093, beats_loss=0.009384, ecapa_loss=0.0001286, whisper_loss=0.09864, over 14367.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01044, ecapa_loss=0.0001448, whisper_loss=0.09013, over 3873588.60 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:30:01,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4257570.0, ans=0.125 2024-08-19 03:30:20,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4257670.0, ans=0.0 2024-08-19 03:30:22,489 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=12.0 2024-08-19 03:30:48,542 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4257870.0, ans=0.0 2024-08-19 03:30:56,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4257970.0, ans=0.1 2024-08-19 03:30:56,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4257970.0, ans=0.1 2024-08-19 03:31:08,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10200, loss[loss=0.1036, beats_loss=0.01204, ecapa_loss=0.0001428, whisper_loss=0.0901, over 22983.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01049, ecapa_loss=0.0001452, whisper_loss=0.08896, over 3879508.54 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:31:23,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.63 vs. limit=22.5 2024-08-19 03:31:32,159 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.806e+01 2.333e+01 2.551e+01 2.847e+01 4.838e+01, threshold=5.102e+01, percent-clipped=0.0 2024-08-19 03:31:44,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4258270.0, ans=0.0 2024-08-19 03:31:58,481 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 28 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 03:32:11,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4258470.0, ans=0.0 2024-08-19 03:32:17,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10250, loss[loss=0.1218, beats_loss=0.007017, ecapa_loss=0.0001963, whisper_loss=0.1129, over 18210.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001449, whisper_loss=0.08996, over 3897036.43 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:32:24,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4258570.0, ans=0.0 2024-08-19 03:32:43,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4258670.0, ans=0.5 2024-08-19 03:32:48,342 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 03:32:52,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4258770.0, ans=0.125 2024-08-19 03:32:59,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4258870.0, ans=0.0 2024-08-19 03:33:04,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4258870.0, ans=0.0 2024-08-19 03:33:19,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4258970.0, ans=0.2 2024-08-19 03:33:20,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4258970.0, ans=0.125 2024-08-19 03:33:27,114 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10300, loss[loss=0.1213, beats_loss=0.008564, ecapa_loss=0.0001463, whisper_loss=0.1113, over 17208.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001445, whisper_loss=0.09059, over 3891066.77 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:33:49,514 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.782e+01 2.334e+01 2.546e+01 2.816e+01 4.072e+01, threshold=5.092e+01, percent-clipped=0.0 2024-08-19 03:33:57,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4259270.0, ans=0.1 2024-08-19 03:34:07,356 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:34:21,556 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 11 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 03:34:35,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10350, loss[loss=0.1169, beats_loss=0.01227, ecapa_loss=9.917e-05, whisper_loss=0.1036, over 16582.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01037, ecapa_loss=0.000144, whisper_loss=0.09066, over 3883683.17 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:35:03,456 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4259670.0, ans=0.0 2024-08-19 03:35:03,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4259670.0, ans=0.0 2024-08-19 03:35:07,381 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 03:35:09,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2024-08-19 03:35:20,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.39 vs. limit=15.0 2024-08-19 03:35:25,351 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 03:35:48,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10400, loss[loss=0.09667, beats_loss=0.009942, ecapa_loss=0.0001497, whisper_loss=0.08523, over 20593.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01026, ecapa_loss=0.0001433, whisper_loss=0.09118, over 3849165.92 frames. ], batch size: 83, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:36:11,408 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.822e+01 2.301e+01 2.515e+01 2.779e+01 4.056e+01, threshold=5.030e+01, percent-clipped=0.0 2024-08-19 03:36:22,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2024-08-19 03:36:42,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4260470.0, ans=0.125 2024-08-19 03:36:45,434 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.02 vs. limit=22.5 2024-08-19 03:36:55,419 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10450, loss[loss=0.1142, beats_loss=0.008941, ecapa_loss=0.0001346, whisper_loss=0.1039, over 21387.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01034, ecapa_loss=0.0001426, whisper_loss=0.09121, over 3866053.17 frames. ], batch size: 82, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:37:46,820 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 03:38:07,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10500, loss[loss=0.08312, beats_loss=0.011, ecapa_loss=0.0001658, whisper_loss=0.07047, over 15800.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01038, ecapa_loss=0.0001434, whisper_loss=0.09108, over 3885269.04 frames. ], batch size: 62, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:38:08,023 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 03:38:19,770 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 03:38:21,060 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 03:38:26,412 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 40 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 03:38:31,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.826e+01 2.267e+01 2.482e+01 2.765e+01 1.931e+02, threshold=4.963e+01, percent-clipped=1.0 2024-08-19 03:38:35,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4261270.0, ans=0.125 2024-08-19 03:38:37,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4261270.0, ans=0.125 2024-08-19 03:38:49,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2024-08-19 03:38:51,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4261370.0, ans=0.0 2024-08-19 03:39:17,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4261570.0, ans=0.0 2024-08-19 03:39:18,195 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10550, loss[loss=0.07259, beats_loss=0.01095, ecapa_loss=0.0001689, whisper_loss=0.05995, over 20864.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01038, ecapa_loss=0.0001436, whisper_loss=0.09094, over 3878530.26 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:39:19,712 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 03:39:22,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4261570.0, ans=0.1 2024-08-19 03:39:24,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-19 03:39:28,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4261570.0, ans=0.125 2024-08-19 03:39:36,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.63 vs. limit=15.0 2024-08-19 03:39:44,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4261770.0, ans=0.1 2024-08-19 03:39:52,772 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 03:39:54,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4261770.0, ans=0.125 2024-08-19 03:39:54,219 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:40:07,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4261870.0, ans=0.125 2024-08-19 03:40:23,241 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 03:40:23,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-08-19 03:40:26,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4261970.0, ans=0.125 2024-08-19 03:40:28,610 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10600, loss[loss=0.09224, beats_loss=0.01243, ecapa_loss=0.0001175, whisper_loss=0.07864, over 15510.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01041, ecapa_loss=0.000143, whisper_loss=0.09066, over 3882858.45 frames. ], batch size: 61, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:40:33,449 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4262070.0, ans=0.0 2024-08-19 03:40:52,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.423e+01 2.630e+01 2.907e+01 3.949e+01, threshold=5.261e+01, percent-clipped=0.0 2024-08-19 03:41:09,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.42 vs. limit=15.0 2024-08-19 03:41:12,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4262370.0, ans=0.1 2024-08-19 03:41:23,403 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 03:41:27,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4262470.0, ans=0.125 2024-08-19 03:41:37,487 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10650, loss[loss=0.09797, beats_loss=0.01171, ecapa_loss=0.0001413, whisper_loss=0.08484, over 21628.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001421, whisper_loss=0.09107, over 3891911.52 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:41:43,030 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 14 from Vox, 21 fro AS 2024-08-19 03:41:49,444 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 03:42:02,963 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 19 from Vox, 46 fro AS 2024-08-19 03:42:11,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.53 vs. limit=10.0 2024-08-19 03:42:11,490 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 03:42:24,642 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 03:42:27,251 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 03:42:28,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4262870.0, ans=0.5 2024-08-19 03:42:35,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4262970.0, ans=0.0 2024-08-19 03:42:42,487 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 03:42:45,238 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 03:42:46,296 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10700, loss[loss=0.09805, beats_loss=0.009067, ecapa_loss=0.0001487, whisper_loss=0.0875, over 18327.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.09137, over 3896082.18 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:42:58,697 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.59 vs. limit=12.0 2024-08-19 03:42:58,712 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=12.0 2024-08-19 03:43:03,114 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 03:43:07,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4263170.0, ans=0.0 2024-08-19 03:43:09,189 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.321e+01 2.471e+01 2.734e+01 8.130e+01, threshold=4.942e+01, percent-clipped=1.0 2024-08-19 03:43:27,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4263370.0, ans=0.0 2024-08-19 03:43:27,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4263370.0, ans=0.1 2024-08-19 03:43:35,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4263370.0, ans=0.0 2024-08-19 03:43:42,328 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2024-08-19 03:43:46,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4263470.0, ans=0.1 2024-08-19 03:43:53,785 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10750, loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.000153, whisper_loss=0.08939, over 13704.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01045, ecapa_loss=0.0001413, whisper_loss=0.09165, over 3890736.61 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 03:43:55,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4263570.0, ans=0.2 2024-08-19 03:44:16,985 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 03:44:33,328 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4263870.0, ans=0.125 2024-08-19 03:44:50,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4263970.0, ans=0.125 2024-08-19 03:44:55,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.94 vs. limit=15.0 2024-08-19 03:44:57,203 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 03:44:58,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4263970.0, ans=0.2 2024-08-19 03:45:00,434 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10800, loss[loss=0.1083, beats_loss=0.008765, ecapa_loss=0.000146, whisper_loss=0.09805, over 16789.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.09168, over 3887843.82 frames. ], batch size: 63, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:45:09,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4264070.0, ans=0.0 2024-08-19 03:45:23,069 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 03:45:25,265 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.001e+01 2.325e+01 2.616e+01 2.924e+01 8.173e+01, threshold=5.233e+01, percent-clipped=1.0 2024-08-19 03:45:27,167 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 32 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 03:45:27,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4264270.0, ans=0.2 2024-08-19 03:45:34,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4264270.0, ans=0.035 2024-08-19 03:45:43,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-08-19 03:45:44,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.15 vs. limit=10.0 2024-08-19 03:45:51,109 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 03:46:09,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10850, loss[loss=0.1019, beats_loss=0.01203, ecapa_loss=0.0001116, whisper_loss=0.0888, over 18681.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01048, ecapa_loss=0.0001414, whisper_loss=0.09182, over 3900682.75 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:46:10,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.17 vs. limit=15.0 2024-08-19 03:46:16,931 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 03:46:25,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4264670.0, ans=0.0 2024-08-19 03:46:35,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4264670.0, ans=0.1 2024-08-19 03:46:35,878 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4264670.0, ans=0.125 2024-08-19 03:46:37,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4264770.0, ans=0.0 2024-08-19 03:47:14,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=15.0 2024-08-19 03:47:29,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4264870.0, ans=0.2 2024-08-19 03:47:51,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10900, loss[loss=0.1195, beats_loss=0.008718, ecapa_loss=0.0001361, whisper_loss=0.1094, over 22513.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01046, ecapa_loss=0.0001411, whisper_loss=0.09201, over 3941991.23 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:48:19,165 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.006e+01 2.332e+01 2.581e+01 2.915e+01 5.254e+01, threshold=5.161e+01, percent-clipped=1.0 2024-08-19 03:48:26,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4265270.0, ans=0.125 2024-08-19 03:48:46,039 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=12.0 2024-08-19 03:49:00,186 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 03:49:01,765 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 03:49:02,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4265470.0, ans=0.0 2024-08-19 03:49:08,519 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 10950, loss[loss=0.1059, beats_loss=0.01187, ecapa_loss=0.0001712, whisper_loss=0.09236, over 21049.00 frames. ], tot_loss[loss=0.1039, beats_loss=0.01045, ecapa_loss=0.0001423, whisper_loss=0.09201, over 3938079.80 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:49:28,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4265670.0, ans=0.07 2024-08-19 03:49:31,740 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 03:49:34,512 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 16 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 03:49:37,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4265770.0, ans=0.2 2024-08-19 03:49:40,209 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 03:49:44,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4265770.0, ans=0.125 2024-08-19 03:49:47,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4265770.0, ans=0.125 2024-08-19 03:49:55,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4265870.0, ans=0.0 2024-08-19 03:50:16,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4265970.0, ans=0.1 2024-08-19 03:50:16,827 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-08-19 03:50:20,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4265970.0, ans=0.125 2024-08-19 03:50:22,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11000, loss[loss=0.1188, beats_loss=0.009124, ecapa_loss=0.0001664, whisper_loss=0.108, over 17826.00 frames. ], tot_loss[loss=0.104, beats_loss=0.01041, ecapa_loss=0.0001424, whisper_loss=0.09213, over 3922550.48 frames. ], batch size: 70, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:50:24,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4266070.0, ans=0.125 2024-08-19 03:50:48,750 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.855e+01 2.345e+01 2.567e+01 2.803e+01 3.050e+02, threshold=5.135e+01, percent-clipped=1.0 2024-08-19 03:50:49,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4266170.0, ans=0.2 2024-08-19 03:50:55,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 03:50:58,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4266270.0, ans=0.2 2024-08-19 03:51:33,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4266470.0, ans=0.125 2024-08-19 03:51:35,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11050, loss[loss=0.08228, beats_loss=0.012, ecapa_loss=0.0001553, whisper_loss=0.06873, over 21610.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01035, ecapa_loss=0.0001433, whisper_loss=0.09204, over 3913716.96 frames. ], batch size: 89, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:51:57,991 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 15 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 03:52:15,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4266770.0, ans=0.125 2024-08-19 03:52:25,735 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 03:52:26,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4266870.0, ans=0.1 2024-08-19 03:52:36,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4266970.0, ans=0.2 2024-08-19 03:52:38,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-19 03:52:40,580 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 03:52:44,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=4266970.0, ans=10.0 2024-08-19 03:52:49,142 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11100, loss[loss=0.08167, beats_loss=0.01013, ecapa_loss=0.0001753, whisper_loss=0.06979, over 16814.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01029, ecapa_loss=0.0001438, whisper_loss=0.09161, over 3901047.12 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:53:14,772 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.398e+01 2.605e+01 2.848e+01 4.368e+01, threshold=5.210e+01, percent-clipped=0.0 2024-08-19 03:53:25,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4267270.0, ans=0.125 2024-08-19 03:53:28,768 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2024-08-19 03:53:32,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4267370.0, ans=0.1 2024-08-19 03:54:00,473 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11150, loss[loss=0.1225, beats_loss=0.01073, ecapa_loss=0.0001278, whisper_loss=0.1104, over 21503.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01023, ecapa_loss=0.0001428, whisper_loss=0.09157, over 3881363.00 frames. ], batch size: 82, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:54:02,733 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4267570.0, ans=0.0 2024-08-19 03:54:05,129 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 03:54:15,126 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 28 from LS+wenet, 11 from Vox, 18 fro AS 2024-08-19 03:54:17,737 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 20 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 03:54:28,570 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4267770.0, ans=0.5 2024-08-19 03:55:03,873 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 03:55:06,233 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 24 from Vox, 21 fro AS 2024-08-19 03:55:08,977 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 03:55:11,493 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11200, loss[loss=0.1077, beats_loss=0.008492, ecapa_loss=0.0001351, whisper_loss=0.09781, over 21288.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01026, ecapa_loss=0.0001425, whisper_loss=0.09121, over 3853560.34 frames. ], batch size: 84, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:55:15,617 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4268070.0, ans=0.0 2024-08-19 03:55:29,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4268170.0, ans=0.125 2024-08-19 03:55:32,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4268170.0, ans=0.04949747468305833 2024-08-19 03:55:37,724 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.683e+01 2.383e+01 2.571e+01 2.965e+01 4.836e+01, threshold=5.143e+01, percent-clipped=0.0 2024-08-19 03:55:59,505 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 03:56:26,216 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11250, loss[loss=0.123, beats_loss=0.009086, ecapa_loss=0.0001429, whisper_loss=0.1125, over 22514.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01033, ecapa_loss=0.0001425, whisper_loss=0.09129, over 3862463.44 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:56:26,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4268570.0, ans=0.125 2024-08-19 03:56:35,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4268570.0, ans=0.125 2024-08-19 03:56:38,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4268570.0, ans=0.125 2024-08-19 03:56:53,081 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 28 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-19 03:56:59,408 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 22 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 03:57:11,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4268870.0, ans=0.2 2024-08-19 03:57:21,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4268870.0, ans=0.2 2024-08-19 03:57:21,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4268870.0, ans=0.1 2024-08-19 03:57:23,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4268870.0, ans=0.2 2024-08-19 03:57:40,753 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11300, loss[loss=0.1056, beats_loss=0.009408, ecapa_loss=0.0001454, whisper_loss=0.09477, over 21610.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001423, whisper_loss=0.09111, over 3867142.47 frames. ], batch size: 88, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:57:41,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.65 vs. limit=15.0 2024-08-19 03:57:46,431 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 03:57:48,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4269070.0, ans=0.125 2024-08-19 03:57:52,047 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 03:58:03,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4269170.0, ans=0.0 2024-08-19 03:58:05,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.950e+01 2.405e+01 2.659e+01 2.967e+01 4.406e+01, threshold=5.318e+01, percent-clipped=0.0 2024-08-19 03:58:07,542 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 13 from Vox, 38 fro AS 2024-08-19 03:58:48,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4269470.0, ans=0.0 2024-08-19 03:58:50,325 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11350, loss[loss=0.1165, beats_loss=0.007432, ecapa_loss=0.0001416, whisper_loss=0.1076, over 14667.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001412, whisper_loss=0.09124, over 3887902.71 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 03:59:05,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4269670.0, ans=0.0 2024-08-19 03:59:30,286 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 03:59:37,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=4269870.0, ans=0.2 2024-08-19 03:59:49,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4269970.0, ans=0.125 2024-08-19 03:59:53,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4269970.0, ans=0.125 2024-08-19 03:59:55,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=12.0 2024-08-19 04:00:03,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4270070.0, ans=0.05 2024-08-19 04:00:04,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11400, loss[loss=0.08731, beats_loss=0.009422, ecapa_loss=0.0001627, whisper_loss=0.07626, over 13749.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.09135, over 3860623.54 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:00:11,917 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 04:00:14,561 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 04:00:25,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4270170.0, ans=0.125 2024-08-19 04:00:30,695 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.261e+01 2.453e+01 2.815e+01 3.733e+01, threshold=4.905e+01, percent-clipped=0.0 2024-08-19 04:00:38,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4270270.0, ans=0.0 2024-08-19 04:00:38,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-19 04:00:56,598 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 04:01:02,244 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 04:01:16,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11450, loss[loss=0.09222, beats_loss=0.009416, ecapa_loss=0.000159, whisper_loss=0.08122, over 17516.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001413, whisper_loss=0.09096, over 3852859.43 frames. ], batch size: 70, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:01:21,006 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 04:01:23,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4270570.0, ans=0.0 2024-08-19 04:01:39,517 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 04:01:48,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4270770.0, ans=0.125 2024-08-19 04:01:49,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4270770.0, ans=0.1 2024-08-19 04:01:52,205 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 38 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 04:02:04,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4270870.0, ans=0.125 2024-08-19 04:02:05,365 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 04:02:11,041 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 04:02:27,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.82 vs. limit=22.5 2024-08-19 04:02:29,411 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11500, loss[loss=0.1196, beats_loss=0.00811, ecapa_loss=0.0001566, whisper_loss=0.11, over 19865.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001415, whisper_loss=0.09117, over 3883355.70 frames. ], batch size: 79, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:02:41,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4271070.0, ans=0.125 2024-08-19 04:02:57,423 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.365e+01 2.674e+01 3.024e+01 4.760e+02, threshold=5.347e+01, percent-clipped=3.0 2024-08-19 04:03:00,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4271270.0, ans=0.125 2024-08-19 04:03:12,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4271270.0, ans=0.2 2024-08-19 04:03:12,692 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4271270.0, ans=0.04949747468305833 2024-08-19 04:03:15,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4271370.0, ans=0.0 2024-08-19 04:03:19,520 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 04:03:27,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4271370.0, ans=0.0 2024-08-19 04:03:31,096 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2024-08-19 04:03:35,843 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.73 vs. limit=22.5 2024-08-19 04:03:48,908 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11550, loss[loss=0.1175, beats_loss=0.00846, ecapa_loss=0.0001426, whisper_loss=0.1076, over 21880.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01037, ecapa_loss=0.000141, whisper_loss=0.091, over 3833454.79 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:04:04,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4271670.0, ans=0.2 2024-08-19 04:04:06,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-08-19 04:04:13,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4271670.0, ans=0.0 2024-08-19 04:04:15,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4271670.0, ans=0.125 2024-08-19 04:04:18,627 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.75 vs. limit=15.0 2024-08-19 04:04:23,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4271770.0, ans=0.125 2024-08-19 04:04:25,177 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=15.0 2024-08-19 04:04:26,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4271770.0, ans=0.0 2024-08-19 04:04:48,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4271870.0, ans=0.0 2024-08-19 04:04:48,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4271870.0, ans=0.125 2024-08-19 04:04:51,984 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 04:05:02,225 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 24 from Vox, 35 fro AS 2024-08-19 04:05:07,667 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11600, loss[loss=0.102, beats_loss=0.0105, ecapa_loss=0.0001187, whisper_loss=0.09032, over 23268.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001402, whisper_loss=0.09092, over 3866107.95 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:05:11,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4272070.0, ans=0.5 2024-08-19 04:05:11,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4272070.0, ans=0.125 2024-08-19 04:05:14,641 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:05:21,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4272070.0, ans=0.125 2024-08-19 04:05:31,254 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4272170.0, ans=0.0 2024-08-19 04:05:33,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-08-19 04:05:35,358 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.347e+01 2.564e+01 2.905e+01 5.911e+01, threshold=5.128e+01, percent-clipped=1.0 2024-08-19 04:05:50,050 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4272270.0, ans=0.2 2024-08-19 04:05:54,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.99 vs. limit=10.0 2024-08-19 04:06:09,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4272370.0, ans=0.125 2024-08-19 04:06:23,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4272470.0, ans=0.1 2024-08-19 04:06:29,694 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11650, loss[loss=0.1119, beats_loss=0.009835, ecapa_loss=0.0001733, whisper_loss=0.1004, over 15897.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.09044, over 3881154.59 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:06:45,104 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 04:06:50,201 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 04:06:55,401 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 04:07:00,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4272670.0, ans=0.125 2024-08-19 04:07:11,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=12.0 2024-08-19 04:07:24,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4272870.0, ans=0.0 2024-08-19 04:07:35,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4272870.0, ans=0.0 2024-08-19 04:07:36,716 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 22 from Vox, 26 fro AS 2024-08-19 04:07:39,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4272970.0, ans=0.0 2024-08-19 04:07:39,270 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4272970.0, ans=0.0 2024-08-19 04:07:40,215 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 04:07:54,687 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11700, loss[loss=0.09326, beats_loss=0.01014, ecapa_loss=0.0001451, whisper_loss=0.08167, over 14054.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.09128, over 3894826.14 frames. ], batch size: 56, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:08:05,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4273070.0, ans=0.125 2024-08-19 04:08:15,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4273170.0, ans=0.125 2024-08-19 04:08:16,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4273170.0, ans=0.125 2024-08-19 04:08:23,707 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.836e+01 2.305e+01 2.645e+01 2.900e+01 9.382e+01, threshold=5.291e+01, percent-clipped=2.0 2024-08-19 04:08:36,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4273270.0, ans=0.125 2024-08-19 04:08:37,165 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2024-08-19 04:08:40,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4273270.0, ans=0.125 2024-08-19 04:08:41,303 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 04:08:51,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2024-08-19 04:08:52,382 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4273370.0, ans=0.1 2024-08-19 04:09:06,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4273470.0, ans=0.5 2024-08-19 04:09:11,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4273470.0, ans=0.125 2024-08-19 04:09:11,803 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=2.99 vs. limit=15.0 2024-08-19 04:09:14,955 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11750, loss[loss=0.0956, beats_loss=0.01194, ecapa_loss=0.0001458, whisper_loss=0.08221, over 18819.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01052, ecapa_loss=0.0001416, whisper_loss=0.09174, over 3919372.72 frames. ], batch size: 74, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:09:30,321 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 31 from LS+wenet, 18 from Vox, 30 fro AS 2024-08-19 04:09:35,097 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 04:09:38,104 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 04:09:50,608 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=12.0 2024-08-19 04:09:52,900 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 04:10:05,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4273870.0, ans=0.125 2024-08-19 04:10:15,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4273970.0, ans=0.125 2024-08-19 04:10:32,836 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11800, loss[loss=0.09851, beats_loss=0.009014, ecapa_loss=0.0001668, whisper_loss=0.08783, over 19751.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01052, ecapa_loss=0.0001417, whisper_loss=0.09186, over 3915575.94 frames. ], batch size: 78, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:10:39,647 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 12 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 04:10:42,638 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4274070.0, ans=0.2 2024-08-19 04:11:03,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.316e+01 2.542e+01 2.977e+01 1.357e+02, threshold=5.084e+01, percent-clipped=2.0 2024-08-19 04:11:06,753 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 22 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 04:11:22,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2024-08-19 04:11:34,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4274370.0, ans=0.025 2024-08-19 04:11:38,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4274470.0, ans=0.2 2024-08-19 04:11:39,245 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.87 vs. limit=12.0 2024-08-19 04:11:43,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4274470.0, ans=0.05 2024-08-19 04:11:45,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4274470.0, ans=0.125 2024-08-19 04:11:54,542 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11850, loss[loss=0.1089, beats_loss=0.01057, ecapa_loss=0.0001643, whisper_loss=0.09666, over 22629.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01052, ecapa_loss=0.0001418, whisper_loss=0.09132, over 3898680.39 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:12:02,542 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 04:12:09,342 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.72 vs. limit=10.0 2024-08-19 04:12:11,911 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4274670.0, ans=0.035 2024-08-19 04:12:17,670 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 04:12:21,020 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 04:12:36,806 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 04:12:44,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4274870.0, ans=0.0 2024-08-19 04:12:45,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4274870.0, ans=0.0 2024-08-19 04:12:46,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4274870.0, ans=0.0 2024-08-19 04:12:50,124 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.07 vs. limit=15.0 2024-08-19 04:12:58,648 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4274970.0, ans=0.0 2024-08-19 04:13:11,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4275070.0, ans=0.2 2024-08-19 04:13:11,779 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11900, loss[loss=0.111, beats_loss=0.0104, ecapa_loss=0.0001441, whisper_loss=0.0992, over 15783.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01053, ecapa_loss=0.0001422, whisper_loss=0.09161, over 3904421.56 frames. ], batch size: 64, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:13:39,087 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.362e+01 2.683e+01 3.004e+01 4.414e+01, threshold=5.366e+01, percent-clipped=0.0 2024-08-19 04:13:43,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4275270.0, ans=0.0 2024-08-19 04:13:53,362 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 04:13:56,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-19 04:13:58,266 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 04:14:06,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2024-08-19 04:14:11,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4275470.0, ans=0.1 2024-08-19 04:14:19,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4275470.0, ans=0.125 2024-08-19 04:14:20,088 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:14:26,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 11950, loss[loss=0.09964, beats_loss=0.008577, ecapa_loss=0.0001762, whisper_loss=0.0893, over 19569.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01044, ecapa_loss=0.0001431, whisper_loss=0.09125, over 3886463.19 frames. ], batch size: 81, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:14:27,293 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4275570.0, ans=0.125 2024-08-19 04:14:47,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4275670.0, ans=0.0 2024-08-19 04:14:47,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4275670.0, ans=0.0 2024-08-19 04:14:51,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=15.0 2024-08-19 04:15:05,171 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2024-08-19 04:15:15,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-19 04:15:24,078 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 04:15:35,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-08-19 04:15:37,466 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12000, loss[loss=0.1144, beats_loss=0.00706, ecapa_loss=0.0001599, whisper_loss=0.1058, over 15384.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0105, ecapa_loss=0.0001428, whisper_loss=0.09087, over 3881516.37 frames. ], batch size: 60, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:15:37,466 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 04:16:19,301 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on ASR_libri: loss=0.2542, beats_loss=0, ecapa_loss=0.0005284, whisper_loss=0.2489, over 922467.00 frames. 2024-08-19 04:16:36,941 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on SV_voxceleb1: loss=0.004097, beats_loss=0, ecapa_loss=0.0004097, whisper_loss=0, over 939242.00 frames. 2024-08-19 04:17:44,565 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.0355, 1.2576, 1.4362, 0.7908, 1.1630, 1.5097, 1.2209, 1.1094], device='cuda:3') 2024-08-19 04:18:29,299 INFO [train_multi_KD3.py:1149] (3/4) Epoch 29, validation on AT_audioset: loss=0.02313, beats_loss=0.02313, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 04:18:29,303 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 04:18:48,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4276170.0, ans=0.04949747468305833 2024-08-19 04:18:49,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=4276170.0, ans=0.5 2024-08-19 04:18:54,620 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.268e+01 2.505e+01 2.757e+01 3.965e+01, threshold=5.011e+01, percent-clipped=0.0 2024-08-19 04:18:59,516 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4276270.0, ans=0.125 2024-08-19 04:19:31,405 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4276470.0, ans=0.125 2024-08-19 04:19:37,881 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 04:19:38,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12050, loss[loss=0.1011, beats_loss=0.01037, ecapa_loss=0.0001184, whisper_loss=0.08954, over 17572.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01047, ecapa_loss=0.0001428, whisper_loss=0.09072, over 3871002.81 frames. ], batch size: 65, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:19:40,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4276570.0, ans=0.125 2024-08-19 04:19:41,779 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 04:19:49,225 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 04:20:04,740 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4276670.0, ans=0.125 2024-08-19 04:20:34,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4276970.0, ans=0.1 2024-08-19 04:20:37,459 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 19 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 04:20:48,515 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12100, loss[loss=0.1053, beats_loss=0.00939, ecapa_loss=0.0001769, whisper_loss=0.09418, over 16159.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01051, ecapa_loss=0.0001432, whisper_loss=0.09084, over 3897713.19 frames. ], batch size: 67, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:21:03,565 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.40 vs. limit=15.0 2024-08-19 04:21:09,138 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 39 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 04:21:09,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4277170.0, ans=0.2 2024-08-19 04:21:13,378 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.859e+01 2.256e+01 2.611e+01 2.868e+01 1.471e+02, threshold=5.223e+01, percent-clipped=2.0 2024-08-19 04:21:19,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4277270.0, ans=0.125 2024-08-19 04:21:49,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4277470.0, ans=0.125 2024-08-19 04:21:58,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12150, loss[loss=0.1188, beats_loss=0.00957, ecapa_loss=0.0001432, whisper_loss=0.1078, over 22650.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01049, ecapa_loss=0.000143, whisper_loss=0.09095, over 3916430.99 frames. ], batch size: 92, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:22:02,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4277570.0, ans=0.2 2024-08-19 04:22:15,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4277670.0, ans=0.0 2024-08-19 04:22:18,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4277670.0, ans=0.0 2024-08-19 04:22:22,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4277670.0, ans=0.125 2024-08-19 04:22:26,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4277770.0, ans=0.95 2024-08-19 04:22:27,769 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 04:22:29,597 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2024-08-19 04:22:30,297 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 04:22:41,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4277870.0, ans=0.125 2024-08-19 04:22:56,584 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 04:23:08,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12200, loss[loss=0.1115, beats_loss=0.008783, ecapa_loss=0.0001389, whisper_loss=0.1013, over 17512.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01045, ecapa_loss=0.0001429, whisper_loss=0.09105, over 3907144.57 frames. ], batch size: 68, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:23:14,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4278070.0, ans=0.0 2024-08-19 04:23:38,852 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.338e+01 2.537e+01 2.829e+01 3.837e+02, threshold=5.074e+01, percent-clipped=2.0 2024-08-19 04:24:03,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4278370.0, ans=0.0 2024-08-19 04:24:28,453 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.204e-01 2024-08-19 04:24:31,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12250, loss[loss=0.08845, beats_loss=0.01062, ecapa_loss=0.0001781, whisper_loss=0.07605, over 18784.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001431, whisper_loss=0.09091, over 3935925.48 frames. ], batch size: 80, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:24:47,131 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2024-08-19 04:25:00,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4278670.0, ans=0.125 2024-08-19 04:25:03,000 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05575420707464218, model_norm_threshold=50.743568420410156 2024-08-19 04:25:03,168 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.self_attn_weights.in_proj.bias with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.164e+05, grad_sumsq=1.291e+04, orig_rms_sq=9.017e+00 2024-08-19 04:25:03,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4278670.0, ans=0.1 2024-08-19 04:25:22,985 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 27 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 04:25:26,741 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 04:25:48,055 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4278970.0, ans=0.125 2024-08-19 04:25:58,744 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=9.656e+00 2024-08-19 04:25:59,456 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12300, loss[loss=0.09061, beats_loss=0.0109, ecapa_loss=0.0001554, whisper_loss=0.07815, over 22034.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001419, whisper_loss=0.08987, over 3917652.36 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:26:28,339 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 04:26:33,444 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.928e+01 2.490e+01 2.668e+01 2.990e+01 9.101e+02, threshold=5.335e+01, percent-clipped=3.0 2024-08-19 04:26:43,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.85 vs. limit=22.5 2024-08-19 04:26:44,508 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 04:26:46,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4279270.0, ans=0.0 2024-08-19 04:27:05,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4279370.0, ans=0.0 2024-08-19 04:27:20,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4279470.0, ans=0.125 2024-08-19 04:27:20,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4279470.0, ans=0.0 2024-08-19 04:27:27,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4279470.0, ans=0.125 2024-08-19 04:27:30,250 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12350, loss[loss=0.1105, beats_loss=0.009445, ecapa_loss=0.0001281, whisper_loss=0.09981, over 19158.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001427, whisper_loss=0.08944, over 3897357.78 frames. ], batch size: 74, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:27:32,075 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 04:27:41,716 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-19 04:27:41,841 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.34 vs. limit=22.5 2024-08-19 04:28:13,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4279770.0, ans=0.0 2024-08-19 04:28:17,460 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 04:28:19,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4279770.0, ans=0.125 2024-08-19 04:28:27,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.56 vs. limit=10.0 2024-08-19 04:28:37,370 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 04:28:50,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4280070.0, ans=0.125 2024-08-19 04:28:51,343 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12400, loss[loss=0.1042, beats_loss=0.009899, ecapa_loss=0.0001253, whisper_loss=0.09309, over 20936.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001418, whisper_loss=0.09047, over 3874103.07 frames. ], batch size: 82, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:29:15,116 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.385e+01 2.588e+01 2.896e+01 2.116e+02, threshold=5.177e+01, percent-clipped=1.0 2024-08-19 04:29:20,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4280270.0, ans=0.125 2024-08-19 04:29:34,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.20 vs. limit=15.0 2024-08-19 04:29:43,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4280470.0, ans=0.0 2024-08-19 04:29:57,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12450, loss[loss=0.0849, beats_loss=0.008038, ecapa_loss=0.0001663, whisper_loss=0.0752, over 18658.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01039, ecapa_loss=0.0001427, whisper_loss=0.09076, over 3859114.25 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:30:00,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4280570.0, ans=0.125 2024-08-19 04:30:01,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4280570.0, ans=0.1 2024-08-19 04:30:02,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4280570.0, ans=0.0 2024-08-19 04:30:04,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4280570.0, ans=0.1 2024-08-19 04:30:34,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4280770.0, ans=0.125 2024-08-19 04:30:38,271 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 04:30:44,983 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 26 from Vox, 23 fro AS 2024-08-19 04:30:56,326 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 29 from LS+wenet, 10 from Vox, 34 fro AS 2024-08-19 04:31:01,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4281070.0, ans=0.035 2024-08-19 04:31:02,552 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12500, loss[loss=0.09694, beats_loss=0.01096, ecapa_loss=0.0001312, whisper_loss=0.08467, over 15327.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01033, ecapa_loss=0.0001421, whisper_loss=0.09143, over 3869717.81 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:31:11,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4281070.0, ans=0.1 2024-08-19 04:31:12,832 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4281070.0, ans=0.125 2024-08-19 04:31:25,717 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.257e+01 2.522e+01 2.778e+01 4.051e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 04:31:25,872 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 04:31:36,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.40 vs. limit=15.0 2024-08-19 04:31:42,970 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 04:31:46,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4281370.0, ans=0.0 2024-08-19 04:32:05,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2024-08-19 04:32:06,727 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12550, loss[loss=0.1271, beats_loss=0.009424, ecapa_loss=0.0001261, whisper_loss=0.1165, over 23773.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01036, ecapa_loss=0.0001421, whisper_loss=0.09161, over 3902726.07 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:32:09,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4281570.0, ans=0.04949747468305833 2024-08-19 04:32:10,594 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 04:32:11,839 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 04:32:23,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4281670.0, ans=0.5 2024-08-19 04:32:25,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4281670.0, ans=0.125 2024-08-19 04:32:30,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4281670.0, ans=0.125 2024-08-19 04:32:32,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4281770.0, ans=0.125 2024-08-19 04:32:37,660 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-08-19 04:32:40,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4281770.0, ans=0.125 2024-08-19 04:33:11,139 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12600, loss[loss=0.09487, beats_loss=0.01134, ecapa_loss=0.000152, whisper_loss=0.08201, over 19649.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01033, ecapa_loss=0.000143, whisper_loss=0.09173, over 3907802.32 frames. ], batch size: 81, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:33:34,969 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.025e+01 2.310e+01 2.550e+01 2.894e+01 3.916e+01, threshold=5.099e+01, percent-clipped=0.0 2024-08-19 04:33:35,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4282170.0, ans=0.125 2024-08-19 04:33:39,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2024-08-19 04:33:56,290 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-08-19 04:34:05,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4282470.0, ans=0.0 2024-08-19 04:34:12,350 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 04:34:16,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12650, loss[loss=0.1146, beats_loss=0.01199, ecapa_loss=0.0001253, whisper_loss=0.1013, over 23063.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01039, ecapa_loss=0.000144, whisper_loss=0.09187, over 3869895.08 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:34:20,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4282570.0, ans=0.0 2024-08-19 04:34:32,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4282670.0, ans=0.125 2024-08-19 04:34:36,410 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 04:34:42,400 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 04:34:43,748 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 04:34:55,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.09 vs. limit=22.5 2024-08-19 04:35:04,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2024-08-19 04:35:10,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4282970.0, ans=0.04949747468305833 2024-08-19 04:35:12,140 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 23 from Vox, 18 fro AS 2024-08-19 04:35:15,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.78 vs. limit=15.0 2024-08-19 04:35:17,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4282970.0, ans=0.125 2024-08-19 04:35:21,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12700, loss[loss=0.1061, beats_loss=0.01011, ecapa_loss=0.0001703, whisper_loss=0.09429, over 22158.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01039, ecapa_loss=0.0001441, whisper_loss=0.09172, over 3869140.08 frames. ], batch size: 93, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:35:34,755 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 04:35:42,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4283170.0, ans=0.0 2024-08-19 04:35:44,702 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.838e+01 2.234e+01 2.542e+01 2.819e+01 3.569e+01, threshold=5.084e+01, percent-clipped=0.0 2024-08-19 04:36:03,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4283370.0, ans=0.0 2024-08-19 04:36:25,099 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4283470.0, ans=0.125 2024-08-19 04:36:27,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12750, loss[loss=0.1055, beats_loss=0.01063, ecapa_loss=0.0001338, whisper_loss=0.09353, over 18314.00 frames. ], tot_loss[loss=0.1038, beats_loss=0.01045, ecapa_loss=0.000143, whisper_loss=0.09197, over 3879508.55 frames. ], batch size: 72, lr: 2.10e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 04:36:27,481 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 04:36:42,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4283670.0, ans=0.0 2024-08-19 04:36:55,334 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 18 from Vox, 44 fro AS 2024-08-19 04:37:08,686 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4283870.0, ans=0.1 2024-08-19 04:37:15,734 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4283870.0, ans=0.025 2024-08-19 04:37:19,963 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2024-08-19 04:37:24,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4283970.0, ans=0.125 2024-08-19 04:37:24,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4283970.0, ans=0.2 2024-08-19 04:37:31,526 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4283970.0, ans=0.125 2024-08-19 04:37:33,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12800, loss[loss=0.1037, beats_loss=0.01133, ecapa_loss=0.000124, whisper_loss=0.09111, over 22835.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01051, ecapa_loss=0.0001431, whisper_loss=0.09132, over 3912354.23 frames. ], batch size: 90, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:37:41,346 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 37 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 04:37:56,533 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.197e+01 2.426e+01 2.664e+01 3.787e+01, threshold=4.851e+01, percent-clipped=0.0 2024-08-19 04:38:04,163 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 04:38:04,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2024-08-19 04:38:09,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.00 vs. limit=6.0 2024-08-19 04:38:23,324 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 14 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 04:38:29,013 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4284470.0, ans=0.1 2024-08-19 04:38:37,229 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12850, loss[loss=0.07389, beats_loss=0.009974, ecapa_loss=0.0001864, whisper_loss=0.06205, over 16639.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01053, ecapa_loss=0.0001435, whisper_loss=0.0904, over 3898099.62 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:38:53,031 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:39:08,074 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 04:39:17,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4284870.0, ans=0.1 2024-08-19 04:39:25,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4284870.0, ans=0.125 2024-08-19 04:39:36,680 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 35 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 04:39:40,138 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12900, loss[loss=0.09679, beats_loss=0.01024, ecapa_loss=0.0001412, whisper_loss=0.08513, over 14722.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.0001438, whisper_loss=0.08993, over 3869085.95 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:39:44,338 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4285070.0, ans=0.2 2024-08-19 04:39:57,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4285170.0, ans=0.125 2024-08-19 04:40:02,820 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.358e+01 2.618e+01 2.908e+01 5.283e+01, threshold=5.236e+01, percent-clipped=1.0 2024-08-19 04:40:05,613 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 36 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 04:40:29,227 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 04:40:32,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4285470.0, ans=0.1 2024-08-19 04:40:33,101 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:40:34,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4285470.0, ans=0.125 2024-08-19 04:40:42,509 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 12950, loss[loss=0.09217, beats_loss=0.01245, ecapa_loss=0.0001347, whisper_loss=0.07837, over 17575.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001426, whisper_loss=0.09011, over 3886616.85 frames. ], batch size: 69, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:40:53,259 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.252e-02 2024-08-19 04:41:01,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4285670.0, ans=0.0 2024-08-19 04:41:02,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.06 vs. limit=15.0 2024-08-19 04:41:19,411 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.63 vs. limit=15.0 2024-08-19 04:41:44,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13000, loss[loss=0.08744, beats_loss=0.01109, ecapa_loss=0.0001268, whisper_loss=0.07507, over 20135.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001412, whisper_loss=0.08989, over 3890404.16 frames. ], batch size: 79, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:41:45,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4286070.0, ans=0.125 2024-08-19 04:42:06,517 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.266e+01 2.592e+01 2.874e+01 4.367e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-19 04:42:08,185 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4286270.0, ans=0.07 2024-08-19 04:42:21,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.31 vs. limit=10.0 2024-08-19 04:42:34,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4286470.0, ans=0.04949747468305833 2024-08-19 04:42:43,266 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 04:42:46,825 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13050, loss[loss=0.1178, beats_loss=0.009446, ecapa_loss=0.0001882, whisper_loss=0.1065, over 22103.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001412, whisper_loss=0.09057, over 3884999.49 frames. ], batch size: 91, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:42:49,839 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2024-08-19 04:43:03,505 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 04:43:05,667 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-08-19 04:43:10,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4286670.0, ans=0.0 2024-08-19 04:43:17,296 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 15 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 04:43:19,006 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 30 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 04:43:21,194 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 04:43:35,620 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 40 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 04:43:41,240 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 04:43:47,077 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4286970.0, ans=0.0 2024-08-19 04:43:57,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13100, loss[loss=0.09386, beats_loss=0.01213, ecapa_loss=0.0001289, whisper_loss=0.08044, over 17792.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.09041, over 3895118.28 frames. ], batch size: 71, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:44:13,247 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 04:44:14,439 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 04:44:23,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.776e+01 2.315e+01 2.592e+01 2.960e+01 1.126e+02, threshold=5.185e+01, percent-clipped=1.0 2024-08-19 04:44:25,206 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 23 from Vox, 21 fro AS 2024-08-19 04:44:29,707 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 04:44:47,284 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 9 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 04:44:47,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4287370.0, ans=0.125 2024-08-19 04:44:54,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4287370.0, ans=0.1 2024-08-19 04:45:05,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4287470.0, ans=0.0 2024-08-19 04:45:07,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=4287470.0, ans=0.025 2024-08-19 04:45:07,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4287470.0, ans=0.1 2024-08-19 04:45:07,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4287470.0, ans=0.0 2024-08-19 04:45:14,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13150, loss[loss=0.1066, beats_loss=0.006792, ecapa_loss=0.0001594, whisper_loss=0.09817, over 14455.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001413, whisper_loss=0.09014, over 3885503.97 frames. ], batch size: 55, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:45:14,553 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 04:45:17,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4287570.0, ans=0.1 2024-08-19 04:45:18,606 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 04:45:22,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4287570.0, ans=0.125 2024-08-19 04:45:46,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4287770.0, ans=0.025 2024-08-19 04:46:09,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4287870.0, ans=0.125 2024-08-19 04:46:12,894 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 04:46:13,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-19 04:46:25,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4287970.0, ans=0.125 2024-08-19 04:46:30,680 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13200, loss[loss=0.08375, beats_loss=0.01008, ecapa_loss=0.000105, whisper_loss=0.07263, over 17585.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001414, whisper_loss=0.09036, over 3864513.63 frames. ], batch size: 66, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:46:35,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4288070.0, ans=0.125 2024-08-19 04:46:39,449 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 04:46:58,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4288170.0, ans=0.0 2024-08-19 04:46:59,770 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.394e+01 2.675e+01 2.983e+01 8.603e+01, threshold=5.350e+01, percent-clipped=2.0 2024-08-19 04:46:59,917 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 15 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 04:47:29,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-08-19 04:47:46,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4288470.0, ans=0.125 2024-08-19 04:47:47,788 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 20 from Vox, 23 fro AS 2024-08-19 04:47:49,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13250, loss[loss=0.08451, beats_loss=0.0101, ecapa_loss=0.0001752, whisper_loss=0.07266, over 14065.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01049, ecapa_loss=0.0001427, whisper_loss=0.09004, over 3860330.26 frames. ], batch size: 58, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:47:51,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4288570.0, ans=0.0 2024-08-19 04:47:56,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4288570.0, ans=0.125 2024-08-19 04:48:07,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4288670.0, ans=0.125 2024-08-19 04:48:20,938 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 04:49:07,109 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13300, loss[loss=0.103, beats_loss=0.01189, ecapa_loss=0.0001113, whisper_loss=0.08998, over 19251.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.09049, over 3867205.90 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:49:17,969 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2024-08-19 04:49:27,910 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4289170.0, ans=0.125 2024-08-19 04:49:34,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.945e+01 2.357e+01 2.503e+01 2.761e+01 3.724e+01, threshold=5.005e+01, percent-clipped=0.0 2024-08-19 04:49:38,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4289270.0, ans=0.1 2024-08-19 04:49:43,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-08-19 04:49:45,969 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 04:50:01,455 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.21 vs. limit=22.5 2024-08-19 04:50:16,734 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 34 from LS+wenet, 12 from Vox, 39 fro AS 2024-08-19 04:50:24,062 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13350, loss[loss=0.1209, beats_loss=0.00909, ecapa_loss=0.000168, whisper_loss=0.1102, over 18443.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001419, whisper_loss=0.09091, over 3838464.01 frames. ], batch size: 76, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:50:28,790 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 04:50:30,466 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 04:50:43,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4289670.0, ans=0.125 2024-08-19 04:50:58,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4289770.0, ans=0.125 2024-08-19 04:50:58,969 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 04:51:08,845 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 18 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 04:51:09,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4289770.0, ans=0.05 2024-08-19 04:51:10,346 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=22.5 2024-08-19 04:51:28,847 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4289970.0, ans=0.0 2024-08-19 04:51:35,093 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 16 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 04:51:40,242 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13400, loss[loss=0.1301, beats_loss=0.008478, ecapa_loss=0.0001892, whisper_loss=0.1198, over 14060.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001416, whisper_loss=0.09046, over 3814425.03 frames. ], batch size: 57, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:51:40,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4290070.0, ans=0.125 2024-08-19 04:51:51,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4290070.0, ans=0.0 2024-08-19 04:51:53,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4290170.0, ans=0.0 2024-08-19 04:52:01,544 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 04:52:05,746 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.283e+01 2.575e+01 2.794e+01 4.211e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-19 04:52:08,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4290270.0, ans=0.0 2024-08-19 04:52:24,168 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4290370.0, ans=0.0 2024-08-19 04:52:25,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4290370.0, ans=0.0 2024-08-19 04:52:33,484 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 04:52:52,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13450, loss[loss=0.1046, beats_loss=0.01042, ecapa_loss=0.0001397, whisper_loss=0.09282, over 19519.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001421, whisper_loss=0.09017, over 3833045.57 frames. ], batch size: 77, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:52:59,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4290570.0, ans=0.2 2024-08-19 04:53:05,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4290570.0, ans=0.125 2024-08-19 04:53:05,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4290570.0, ans=0.04949747468305833 2024-08-19 04:53:12,315 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4290670.0, ans=0.2 2024-08-19 04:53:15,183 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 04:53:19,064 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 04:53:19,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.87 vs. limit=15.0 2024-08-19 04:53:26,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4290770.0, ans=0.1 2024-08-19 04:53:28,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4290770.0, ans=0.05 2024-08-19 04:53:36,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4290770.0, ans=0.04949747468305833 2024-08-19 04:53:59,191 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4290970.0, ans=0.125 2024-08-19 04:54:01,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4290970.0, ans=0.2 2024-08-19 04:54:10,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13500, loss[loss=0.09365, beats_loss=0.01271, ecapa_loss=0.0001525, whisper_loss=0.07941, over 20269.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01043, ecapa_loss=0.0001422, whisper_loss=0.09041, over 3873351.89 frames. ], batch size: 87, lr: 2.10e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:54:14,832 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.37 vs. limit=22.5 2024-08-19 04:54:21,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4291070.0, ans=0.0 2024-08-19 04:54:31,531 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.64 vs. limit=15.0 2024-08-19 04:54:37,460 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.892e+01 2.302e+01 2.521e+01 2.822e+01 3.950e+01, threshold=5.042e+01, percent-clipped=0.0 2024-08-19 04:55:01,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4291370.0, ans=0.125 2024-08-19 04:55:20,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=15.0 2024-08-19 04:55:23,451 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13550, loss[loss=0.1007, beats_loss=0.0109, ecapa_loss=0.0001244, whisper_loss=0.0886, over 22650.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001424, whisper_loss=0.09059, over 3856629.41 frames. ], batch size: 88, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:55:35,860 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.90 vs. limit=10.0 2024-08-19 04:55:38,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4291670.0, ans=0.125 2024-08-19 04:55:42,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4291670.0, ans=10.0 2024-08-19 04:55:44,014 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 04:55:58,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4291770.0, ans=0.0 2024-08-19 04:56:04,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4291770.0, ans=0.125 2024-08-19 04:56:26,175 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 21 from LS+wenet, 29 from Vox, 40 fro AS 2024-08-19 04:56:28,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4291970.0, ans=0.125 2024-08-19 04:56:31,902 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 04:56:33,850 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 15 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 04:56:36,275 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13600, loss[loss=0.08765, beats_loss=0.007906, ecapa_loss=0.0001482, whisper_loss=0.07826, over 15750.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.0001416, whisper_loss=0.08992, over 3859484.01 frames. ], batch size: 62, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:56:40,358 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 15 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 04:56:52,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4292170.0, ans=0.0 2024-08-19 04:57:01,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4292170.0, ans=0.125 2024-08-19 04:57:02,143 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.943e+01 2.298e+01 2.598e+01 3.003e+01 1.611e+02, threshold=5.196e+01, percent-clipped=4.0 2024-08-19 04:57:06,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2024-08-19 04:57:10,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4292270.0, ans=0.125 2024-08-19 04:57:38,688 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 04:57:50,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4292570.0, ans=0.2 2024-08-19 04:57:50,876 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13650, loss[loss=0.08139, beats_loss=0.012, ecapa_loss=0.000112, whisper_loss=0.06827, over 16701.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01074, ecapa_loss=0.0001416, whisper_loss=0.08878, over 3838675.59 frames. ], batch size: 66, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:58:16,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2024-08-19 04:58:23,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4292770.0, ans=0.1 2024-08-19 04:58:39,982 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4292870.0, ans=0.05 2024-08-19 04:58:50,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4292870.0, ans=0.5 2024-08-19 04:59:05,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4292970.0, ans=0.0 2024-08-19 04:59:08,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13700, loss[loss=0.08015, beats_loss=0.01117, ecapa_loss=0.0001633, whisper_loss=0.06736, over 15355.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01075, ecapa_loss=0.0001412, whisper_loss=0.08848, over 3834472.26 frames. ], batch size: 64, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 04:59:38,033 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.742e+01 2.235e+01 2.503e+01 2.717e+01 3.786e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-19 04:59:52,800 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 05:00:00,872 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4293370.0, ans=0.09899494936611666 2024-08-19 05:00:28,133 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13750, loss[loss=0.1176, beats_loss=0.009654, ecapa_loss=0.0001328, whisper_loss=0.1066, over 23798.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.0107, ecapa_loss=0.0001412, whisper_loss=0.08906, over 3862534.25 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:00:28,834 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4293570.0, ans=0.0 2024-08-19 05:00:45,442 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=12.0 2024-08-19 05:00:52,103 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 05:00:53,413 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 15 from Vox, 33 fro AS 2024-08-19 05:01:02,324 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:01:02,334 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.330e+05 2024-08-19 05:01:16,344 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 05:01:16,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4293870.0, ans=0.2 2024-08-19 05:01:20,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4293870.0, ans=0.125 2024-08-19 05:01:27,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4293970.0, ans=0.125 2024-08-19 05:01:34,510 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2024-08-19 05:01:39,183 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13800, loss[loss=0.1011, beats_loss=0.01302, ecapa_loss=0.0001124, whisper_loss=0.08694, over 19980.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01067, ecapa_loss=0.0001415, whisper_loss=0.08952, over 3875270.93 frames. ], batch size: 78, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:01:51,433 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 29 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 05:02:02,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.948e+01 2.336e+01 2.495e+01 2.875e+01 4.670e+01, threshold=4.991e+01, percent-clipped=0.0 2024-08-19 05:02:12,181 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 29 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 05:02:20,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4294370.0, ans=0.1 2024-08-19 05:02:25,768 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4294370.0, ans=0.125 2024-08-19 05:02:29,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4294370.0, ans=0.1 2024-08-19 05:02:29,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4294370.0, ans=0.125 2024-08-19 05:02:45,069 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13850, loss[loss=0.08875, beats_loss=0.01067, ecapa_loss=0.0001099, whisper_loss=0.07699, over 14807.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01065, ecapa_loss=0.0001404, whisper_loss=0.08942, over 3851312.72 frames. ], batch size: 56, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:02:48,085 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4294570.0, ans=0.125 2024-08-19 05:03:09,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4294670.0, ans=0.1 2024-08-19 05:03:11,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4294770.0, ans=0.125 2024-08-19 05:03:14,154 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4294770.0, ans=0.125 2024-08-19 05:03:17,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4294770.0, ans=0.125 2024-08-19 05:03:50,098 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13900, loss[loss=0.1081, beats_loss=0.01039, ecapa_loss=0.000136, whisper_loss=0.09634, over 22598.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01062, ecapa_loss=0.0001402, whisper_loss=0.09001, over 3853948.24 frames. ], batch size: 89, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:03:54,761 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:03:56,504 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=22.5 2024-08-19 05:04:12,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4295170.0, ans=0.0 2024-08-19 05:04:13,641 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.793e+01 2.343e+01 2.547e+01 2.776e+01 4.991e+01, threshold=5.095e+01, percent-clipped=1.0 2024-08-19 05:04:18,924 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-08-19 05:04:18,946 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.23 vs. limit=6.0 2024-08-19 05:04:29,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4295370.0, ans=0.1 2024-08-19 05:04:45,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4295470.0, ans=0.125 2024-08-19 05:04:46,640 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-08-19 05:04:47,324 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 12 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 05:04:54,588 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.15 vs. limit=6.0 2024-08-19 05:04:56,503 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 13950, loss[loss=0.1063, beats_loss=0.01269, ecapa_loss=0.0001216, whisper_loss=0.09241, over 22484.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01059, ecapa_loss=0.000141, whisper_loss=0.08988, over 3871940.30 frames. ], batch size: 88, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:04:57,569 WARNING [optim.py:496] (3/4) Scaling gradients by 0.029894206672906876, model_norm_threshold=50.94768524169922 2024-08-19 05:04:57,742 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.16, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.708e+05, grad_sumsq=1.429e+05, orig_rms_sq=3.294e+00 2024-08-19 05:05:03,144 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 05:05:15,066 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 05:05:24,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4295770.0, ans=0.125 2024-08-19 05:05:25,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4295770.0, ans=0.2 2024-08-19 05:05:25,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-08-19 05:05:27,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4295770.0, ans=0.2 2024-08-19 05:05:29,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4295770.0, ans=0.125 2024-08-19 05:05:33,532 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 05:05:51,427 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-08-19 05:06:02,420 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14000, loss[loss=0.08856, beats_loss=0.01101, ecapa_loss=0.0001365, whisper_loss=0.07619, over 20510.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01053, ecapa_loss=0.0001412, whisper_loss=0.08974, over 3836670.37 frames. ], batch size: 84, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:06:06,390 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 05:06:21,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4296170.0, ans=0.125 2024-08-19 05:06:26,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.077e+01 2.382e+01 2.706e+01 3.107e+01 1.704e+03, threshold=5.412e+01, percent-clipped=4.0 2024-08-19 05:06:33,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4296270.0, ans=0.2 2024-08-19 05:06:34,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2024-08-19 05:06:38,578 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 05:06:43,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4296370.0, ans=0.1 2024-08-19 05:06:47,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4296370.0, ans=0.0 2024-08-19 05:07:05,979 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.15 vs. limit=22.5 2024-08-19 05:07:07,732 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14050, loss[loss=0.1127, beats_loss=0.009888, ecapa_loss=0.0001484, whisper_loss=0.1013, over 18759.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001416, whisper_loss=0.09015, over 3836043.28 frames. ], batch size: 73, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:07:09,241 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 05:07:46,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4296870.0, ans=0.1 2024-08-19 05:07:49,428 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 05:07:53,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4296870.0, ans=0.0 2024-08-19 05:08:14,759 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14100, loss[loss=0.07604, beats_loss=0.01284, ecapa_loss=0.000119, whisper_loss=0.06202, over 17253.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01052, ecapa_loss=0.0001418, whisper_loss=0.09004, over 3798287.89 frames. ], batch size: 68, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:08:17,505 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 05:08:29,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2024-08-19 05:08:38,009 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.983e+01 2.319e+01 2.520e+01 2.882e+01 1.476e+02, threshold=5.041e+01, percent-clipped=1.0 2024-08-19 05:08:39,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4297270.0, ans=0.125 2024-08-19 05:08:43,462 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 05:08:44,745 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 05:08:55,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4297370.0, ans=0.125 2024-08-19 05:08:56,069 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 32 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 05:08:56,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2024-08-19 05:09:03,397 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=4297370.0, ans=10.0 2024-08-19 05:09:04,309 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 05:09:06,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4297470.0, ans=0.125 2024-08-19 05:09:07,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4297470.0, ans=0.0 2024-08-19 05:09:07,073 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4297470.0, ans=0.2 2024-08-19 05:09:12,993 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 05:09:18,317 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 05:09:18,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4297570.0, ans=0.125 2024-08-19 05:09:19,372 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14150, loss[loss=0.1101, beats_loss=0.00885, ecapa_loss=0.0001369, whisper_loss=0.0999, over 18811.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001414, whisper_loss=0.09045, over 3798424.56 frames. ], batch size: 75, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:09:19,599 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 05:09:27,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4297570.0, ans=0.0 2024-08-19 05:09:44,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4297770.0, ans=0.125 2024-08-19 05:09:49,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4297770.0, ans=0.125 2024-08-19 05:09:55,591 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 31 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 05:09:55,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4297770.0, ans=0.125 2024-08-19 05:10:10,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4297970.0, ans=0.0 2024-08-19 05:10:24,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14200, loss[loss=0.1099, beats_loss=0.01094, ecapa_loss=0.000122, whisper_loss=0.09769, over 23585.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001428, whisper_loss=0.09015, over 3834922.76 frames. ], batch size: 92, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:10:24,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4298070.0, ans=0.125 2024-08-19 05:10:25,617 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 05:10:35,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4298070.0, ans=0.2 2024-08-19 05:10:48,042 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.276e+01 2.492e+01 2.801e+01 5.821e+01, threshold=4.984e+01, percent-clipped=1.0 2024-08-19 05:11:02,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4298270.0, ans=0.125 2024-08-19 05:11:03,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.68 vs. limit=10.0 2024-08-19 05:11:07,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2024-08-19 05:11:08,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4298370.0, ans=0.125 2024-08-19 05:11:14,023 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4298370.0, ans=0.2 2024-08-19 05:11:29,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4298570.0, ans=10.0 2024-08-19 05:11:29,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4298570.0, ans=0.05 2024-08-19 05:11:29,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14250, loss[loss=0.09355, beats_loss=0.01307, ecapa_loss=0.0001411, whisper_loss=0.07907, over 22887.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001415, whisper_loss=0.09054, over 3870549.87 frames. ], batch size: 93, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:11:41,713 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-08-19 05:11:48,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4298670.0, ans=0.0 2024-08-19 05:12:03,199 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4298770.0, ans=0.07 2024-08-19 05:12:08,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4298870.0, ans=0.0 2024-08-19 05:12:10,902 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=8.08 vs. limit=12.0 2024-08-19 05:12:21,617 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 05:12:21,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4298970.0, ans=0.125 2024-08-19 05:12:24,177 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 21 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 05:12:27,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4298970.0, ans=0.0 2024-08-19 05:12:29,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4298970.0, ans=0.1 2024-08-19 05:12:34,233 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14300, loss[loss=0.101, beats_loss=0.01243, ecapa_loss=0.0001173, whisper_loss=0.0874, over 22776.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01037, ecapa_loss=0.0001415, whisper_loss=0.09126, over 3889934.38 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:12:34,417 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 05:12:51,757 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 05:12:55,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4299170.0, ans=0.0 2024-08-19 05:12:57,973 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.328e+01 2.560e+01 2.862e+01 1.139e+02, threshold=5.121e+01, percent-clipped=2.0 2024-08-19 05:13:09,571 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 05:13:11,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4299270.0, ans=0.0 2024-08-19 05:13:19,384 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 05:13:19,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4299370.0, ans=0.1 2024-08-19 05:13:21,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2024-08-19 05:13:22,271 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:13:40,124 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14350, loss[loss=0.08222, beats_loss=0.01281, ecapa_loss=0.0001367, whisper_loss=0.06804, over 22340.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.0906, over 3908537.65 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:14:09,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4299770.0, ans=0.125 2024-08-19 05:14:16,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=4299770.0, ans=0.125 2024-08-19 05:14:43,946 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14400, loss[loss=0.0883, beats_loss=0.01303, ecapa_loss=0.0001521, whisper_loss=0.07375, over 15491.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001432, whisper_loss=0.09064, over 3907236.94 frames. ], batch size: 63, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:14:52,895 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 05:15:03,739 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.70 vs. limit=10.0 2024-08-19 05:15:06,840 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.846e+01 2.240e+01 2.498e+01 2.789e+01 4.237e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 05:15:22,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4300370.0, ans=0.2 2024-08-19 05:15:33,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4300370.0, ans=0.125 2024-08-19 05:15:38,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4300470.0, ans=0.125 2024-08-19 05:15:48,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4300570.0, ans=0.125 2024-08-19 05:15:48,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 29, batch 14450, loss[loss=0.1039, beats_loss=0.008851, ecapa_loss=0.0001731, whisper_loss=0.09328, over 21194.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001434, whisper_loss=0.09033, over 3918890.71 frames. ], batch size: 90, lr: 2.09e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:15:56,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4300570.0, ans=0.1 2024-08-19 05:16:04,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.50 vs. limit=22.5 2024-08-19 05:16:06,079 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 20 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 05:16:12,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4300670.0, ans=0.5 2024-08-19 05:16:25,678 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4300870.0, ans=0.125 2024-08-19 05:16:27,842 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 05:16:36,580 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4300870.0, ans=0.0 2024-08-19 05:17:20,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4300990.0, ans=0.09899494936611666 2024-08-19 05:17:20,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 0, loss[loss=0.08778, beats_loss=0.01036, ecapa_loss=0.0001467, whisper_loss=0.07595, over 19324.00 frames. ], tot_loss[loss=0.08778, beats_loss=0.01036, ecapa_loss=0.0001467, whisper_loss=0.07595, over 19324.00 frames. ], batch size: 77, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:17:20,844 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 05:17:59,464 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on ASR_libri: loss=0.2538, beats_loss=0, ecapa_loss=0.0005174, whisper_loss=0.2486, over 922467.00 frames. 2024-08-19 05:18:14,951 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on SV_voxceleb1: loss=0.003909, beats_loss=0, ecapa_loss=0.0003909, whisper_loss=0, over 939242.00 frames. 2024-08-19 05:18:31,723 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.0576, 2.6635, 2.6762, 2.6400], device='cuda:3') 2024-08-19 05:19:05,211 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2698, 5.0279, 5.1796, 5.2184], device='cuda:3') 2024-08-19 05:20:09,450 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 05:20:09,454 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 05:20:22,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4300990.0, ans=0.125 2024-08-19 05:20:40,512 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2024-08-19 05:21:00,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4301190.0, ans=0.2 2024-08-19 05:21:16,718 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.897e+01 2.422e+01 2.650e+01 2.982e+01 4.420e+02, threshold=5.300e+01, percent-clipped=2.0 2024-08-19 05:21:17,667 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 19 from LS+wenet, 25 from Vox, 21 fro AS 2024-08-19 05:21:38,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4301290.0, ans=0.125 2024-08-19 05:21:42,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4301290.0, ans=0.0 2024-08-19 05:21:55,443 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 28 from Vox, 28 fro AS 2024-08-19 05:21:59,973 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 05:22:12,618 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 50, loss[loss=0.0987, beats_loss=0.006724, ecapa_loss=0.0001728, whisper_loss=0.09025, over 21554.00 frames. ], tot_loss[loss=0.1051, beats_loss=0.009209, ecapa_loss=0.0001462, whisper_loss=0.09439, over 904869.13 frames. ], batch size: 89, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:22:35,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4301590.0, ans=0.07 2024-08-19 05:23:09,454 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 05:23:14,801 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 05:23:37,619 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 32 from Vox, 27 fro AS 2024-08-19 05:23:49,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4301890.0, ans=0.125 2024-08-19 05:24:06,371 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 39 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 05:24:07,358 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 100, loss[loss=0.1326, beats_loss=0.008395, ecapa_loss=0.000167, whisper_loss=0.1225, over 22809.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.009132, ecapa_loss=0.0001437, whisper_loss=0.09251, over 1558257.91 frames. ], batch size: 88, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:24:13,501 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 27 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 05:24:19,710 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 05:24:31,269 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:25:02,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4302190.0, ans=0.1 2024-08-19 05:25:05,389 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.646e+01 2.905e+01 3.235e+01 6.271e+01, threshold=5.810e+01, percent-clipped=1.0 2024-08-19 05:25:10,928 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 05:25:21,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4302290.0, ans=0.125 2024-08-19 05:25:29,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4302290.0, ans=0.1 2024-08-19 05:25:33,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4302390.0, ans=0.0 2024-08-19 05:25:39,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4302390.0, ans=0.125 2024-08-19 05:25:50,932 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:25:51,604 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 150, loss[loss=0.09796, beats_loss=0.008586, ecapa_loss=0.0001714, whisper_loss=0.08766, over 16707.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.009235, ecapa_loss=0.0001452, whisper_loss=0.09125, over 2066178.39 frames. ], batch size: 66, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:25:57,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4302490.0, ans=0.1 2024-08-19 05:26:11,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4302590.0, ans=0.09899494936611666 2024-08-19 05:26:41,125 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4302790.0, ans=0.125 2024-08-19 05:26:55,177 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 05:27:05,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 200, loss[loss=0.1038, beats_loss=0.01016, ecapa_loss=0.0001168, whisper_loss=0.0925, over 17514.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009397, ecapa_loss=0.0001448, whisper_loss=0.09127, over 2406141.14 frames. ], batch size: 66, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:27:11,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4302990.0, ans=0.1 2024-08-19 05:27:28,558 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4303090.0, ans=0.2 2024-08-19 05:27:42,100 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.865e+01 2.381e+01 2.550e+01 2.834e+01 4.054e+01, threshold=5.100e+01, percent-clipped=0.0 2024-08-19 05:27:50,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4303290.0, ans=0.125 2024-08-19 05:28:12,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 250, loss[loss=0.08908, beats_loss=0.01327, ecapa_loss=0.0001168, whisper_loss=0.07464, over 21427.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.009709, ecapa_loss=0.0001447, whisper_loss=0.0903, over 2730390.73 frames. ], batch size: 88, lr: 2.06e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 05:28:22,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4303490.0, ans=0.125 2024-08-19 05:28:27,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4303590.0, ans=0.2 2024-08-19 05:28:33,039 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 05:28:33,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4303590.0, ans=10.0 2024-08-19 05:28:35,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4303590.0, ans=0.1 2024-08-19 05:28:38,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4303690.0, ans=0.2 2024-08-19 05:28:46,651 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.09 vs. limit=10.0 2024-08-19 05:28:49,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.59 vs. limit=10.0 2024-08-19 05:29:04,273 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 14 from Vox, 34 fro AS 2024-08-19 05:29:12,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4303890.0, ans=0.125 2024-08-19 05:29:15,019 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 300, loss[loss=0.09051, beats_loss=0.01223, ecapa_loss=0.0001359, whisper_loss=0.07692, over 20902.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.009871, ecapa_loss=0.0001434, whisper_loss=0.09075, over 2986136.86 frames. ], batch size: 82, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:29:28,041 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 05:29:43,189 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 05:29:43,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=4304190.0, ans=0.1 2024-08-19 05:29:44,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4304190.0, ans=0.125 2024-08-19 05:29:45,783 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4304190.0, ans=0.125 2024-08-19 05:29:47,831 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.657e+01 2.190e+01 2.363e+01 2.558e+01 4.313e+01, threshold=4.727e+01, percent-clipped=0.0 2024-08-19 05:29:48,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4304190.0, ans=0.125 2024-08-19 05:29:56,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4304290.0, ans=0.125 2024-08-19 05:29:58,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4304290.0, ans=0.09899494936611666 2024-08-19 05:30:03,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4304290.0, ans=0.2 2024-08-19 05:30:04,492 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 39 from LS+wenet, 13 from Vox, 20 fro AS 2024-08-19 05:30:09,473 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 05:30:11,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2024-08-19 05:30:12,380 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.67 vs. limit=10.0 2024-08-19 05:30:13,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4304390.0, ans=0.2 2024-08-19 05:30:17,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 350, loss[loss=0.1068, beats_loss=0.01065, ecapa_loss=0.0001434, whisper_loss=0.09469, over 18522.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01012, ecapa_loss=0.0001424, whisper_loss=0.09004, over 3186955.78 frames. ], batch size: 74, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:30:23,233 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 05:30:31,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4304590.0, ans=0.0 2024-08-19 05:30:34,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4304590.0, ans=0.035 2024-08-19 05:30:44,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4304690.0, ans=0.1 2024-08-19 05:31:01,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4304790.0, ans=0.125 2024-08-19 05:31:04,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4304790.0, ans=0.0 2024-08-19 05:31:05,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4304790.0, ans=0.2 2024-08-19 05:31:06,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4304890.0, ans=0.125 2024-08-19 05:31:19,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 400, loss[loss=0.1009, beats_loss=0.009075, ecapa_loss=0.0001649, whisper_loss=0.09015, over 19034.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01024, ecapa_loss=0.0001417, whisper_loss=0.08934, over 3319499.88 frames. ], batch size: 78, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:31:22,953 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4304990.0, ans=0.0 2024-08-19 05:31:25,070 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 05:31:41,260 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 05:31:51,905 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.222e+01 2.400e+01 2.669e+01 1.087e+02, threshold=4.801e+01, percent-clipped=1.0 2024-08-19 05:32:01,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4305290.0, ans=0.125 2024-08-19 05:32:21,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4305490.0, ans=0.07 2024-08-19 05:32:21,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 450, loss[loss=0.09819, beats_loss=0.00913, ecapa_loss=0.0001727, whisper_loss=0.08733, over 18259.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01027, ecapa_loss=0.0001426, whisper_loss=0.08863, over 3413540.42 frames. ], batch size: 75, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:33:07,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4305790.0, ans=0.125 2024-08-19 05:33:09,349 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 05:33:20,487 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 05:33:24,076 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 500, loss[loss=0.093, beats_loss=0.01182, ecapa_loss=0.0001438, whisper_loss=0.07974, over 23492.00 frames. ], tot_loss[loss=0.0998, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.08804, over 3507279.64 frames. ], batch size: 95, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:33:24,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4305990.0, ans=0.125 2024-08-19 05:33:24,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4305990.0, ans=0.0 2024-08-19 05:33:29,327 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 23 from LS+wenet, 18 from Vox, 54 fro AS 2024-08-19 05:33:56,744 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.612e+01 2.301e+01 2.420e+01 2.660e+01 4.195e+01, threshold=4.841e+01, percent-clipped=0.0 2024-08-19 05:33:59,537 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 05:34:03,022 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 05:34:06,344 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.40 vs. limit=22.5 2024-08-19 05:34:09,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4306290.0, ans=0.95 2024-08-19 05:34:12,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4306290.0, ans=0.2 2024-08-19 05:34:15,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4306390.0, ans=0.125 2024-08-19 05:34:26,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 550, loss[loss=0.1037, beats_loss=0.008922, ecapa_loss=0.0001495, whisper_loss=0.09332, over 15324.00 frames. ], tot_loss[loss=0.1003, beats_loss=0.01037, ecapa_loss=0.000141, whisper_loss=0.08852, over 3599994.66 frames. ], batch size: 59, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:34:31,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4306490.0, ans=0.0 2024-08-19 05:34:47,971 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4306590.0, ans=0.125 2024-08-19 05:34:49,194 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4306590.0, ans=0.125 2024-08-19 05:34:58,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4306690.0, ans=0.125 2024-08-19 05:34:59,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4306690.0, ans=0.04949747468305833 2024-08-19 05:35:05,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4306790.0, ans=0.1 2024-08-19 05:35:12,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.76 vs. limit=10.0 2024-08-19 05:35:14,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4306790.0, ans=0.125 2024-08-19 05:35:28,492 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 600, loss[loss=0.1326, beats_loss=0.006968, ecapa_loss=0.0001521, whisper_loss=0.1241, over 18615.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01035, ecapa_loss=0.0001398, whisper_loss=0.08931, over 3634810.36 frames. ], batch size: 70, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:35:35,884 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2024-08-19 05:35:37,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4306990.0, ans=0.2 2024-08-19 05:35:38,890 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 05:35:44,042 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4307090.0, ans=0.125 2024-08-19 05:35:49,250 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-19 05:35:49,961 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 15 from Vox, 18 fro AS 2024-08-19 05:35:52,779 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 05:35:55,285 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 12 from Vox, 37 fro AS 2024-08-19 05:36:01,144 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.934e+01 2.277e+01 2.491e+01 2.795e+01 3.103e+02, threshold=4.982e+01, percent-clipped=2.0 2024-08-19 05:36:02,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4307190.0, ans=0.0 2024-08-19 05:36:09,750 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 05:36:15,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4307290.0, ans=0.0 2024-08-19 05:36:16,261 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4307290.0, ans=0.125 2024-08-19 05:36:18,653 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 05:36:23,429 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 29 from Vox, 22 fro AS 2024-08-19 05:36:30,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 650, loss[loss=0.09392, beats_loss=0.01231, ecapa_loss=0.0001097, whisper_loss=0.08052, over 15521.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01036, ecapa_loss=0.00014, whisper_loss=0.08876, over 3654369.82 frames. ], batch size: 58, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:36:32,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4307490.0, ans=0.2 2024-08-19 05:36:37,032 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 05:36:44,734 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2024-08-19 05:36:45,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4307590.0, ans=0.2 2024-08-19 05:36:52,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.16 vs. limit=6.0 2024-08-19 05:36:53,184 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 05:36:56,622 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 05:37:13,868 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 05:37:19,094 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4307890.0, ans=0.125 2024-08-19 05:37:22,129 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 05:37:28,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4307890.0, ans=0.125 2024-08-19 05:37:32,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 700, loss[loss=0.08658, beats_loss=0.0145, ecapa_loss=8.286e-05, whisper_loss=0.07125, over 17633.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01036, ecapa_loss=0.0001422, whisper_loss=0.08839, over 3685365.35 frames. ], batch size: 67, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:37:33,939 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.587e+01 2024-08-19 05:37:34,076 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.23 vs. limit=22.5 2024-08-19 05:37:41,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4307990.0, ans=0.2 2024-08-19 05:37:50,424 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=22.5 2024-08-19 05:38:03,914 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.712e+01 2.327e+01 2.528e+01 2.779e+01 3.860e+01, threshold=5.056e+01, percent-clipped=1.0 2024-08-19 05:38:05,194 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 05:38:05,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4308190.0, ans=0.125 2024-08-19 05:38:14,469 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 05:38:15,643 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 05:38:21,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4308390.0, ans=0.2 2024-08-19 05:38:21,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4308390.0, ans=0.1 2024-08-19 05:38:33,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 750, loss[loss=0.1064, beats_loss=0.01014, ecapa_loss=0.0001307, whisper_loss=0.09495, over 23415.00 frames. ], tot_loss[loss=0.09969, beats_loss=0.01047, ecapa_loss=0.0001406, whisper_loss=0.08781, over 3699429.01 frames. ], batch size: 92, lr: 2.06e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:38:42,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4308490.0, ans=0.0 2024-08-19 05:38:43,570 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 05:38:49,085 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 05:38:49,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4308590.0, ans=0.125 2024-08-19 05:38:51,628 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 28 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 05:38:56,867 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2024-08-19 05:39:01,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4308690.0, ans=0.125 2024-08-19 05:39:03,308 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 23 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 05:39:11,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4308790.0, ans=0.05 2024-08-19 05:39:12,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.73 vs. limit=12.0 2024-08-19 05:39:24,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4308890.0, ans=0.0 2024-08-19 05:39:27,986 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 05:39:35,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 800, loss[loss=0.1117, beats_loss=0.01031, ecapa_loss=0.0001414, whisper_loss=0.09997, over 22802.00 frames. ], tot_loss[loss=0.09994, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.08806, over 3730900.78 frames. ], batch size: 92, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:39:36,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4308990.0, ans=0.05 2024-08-19 05:39:49,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4309090.0, ans=0.125 2024-08-19 05:40:00,390 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=22.5 2024-08-19 05:40:07,652 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.249e+01 2.414e+01 2.640e+01 3.694e+01, threshold=4.828e+01, percent-clipped=0.0 2024-08-19 05:40:09,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4309190.0, ans=0.125 2024-08-19 05:40:11,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4309290.0, ans=0.125 2024-08-19 05:40:19,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4309290.0, ans=0.1 2024-08-19 05:40:22,637 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 05:40:25,299 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 05:40:29,092 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 05:40:34,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4309390.0, ans=0.125 2024-08-19 05:40:37,928 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 850, loss[loss=0.08351, beats_loss=0.009926, ecapa_loss=0.00021, whisper_loss=0.07149, over 20337.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01042, ecapa_loss=0.0001401, whisper_loss=0.08883, over 3750987.76 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:40:42,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4309490.0, ans=0.125 2024-08-19 05:41:05,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4309690.0, ans=0.0 2024-08-19 05:41:14,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4309790.0, ans=0.0 2024-08-19 05:41:37,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4309890.0, ans=0.0 2024-08-19 05:41:37,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4309890.0, ans=0.1 2024-08-19 05:41:38,857 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4309890.0, ans=0.0 2024-08-19 05:41:40,998 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 900, loss[loss=0.1004, beats_loss=0.01009, ecapa_loss=0.0001457, whisper_loss=0.08884, over 20846.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01036, ecapa_loss=0.0001406, whisper_loss=0.08848, over 3738051.55 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:41:48,187 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-19 05:42:07,947 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 05:42:08,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4310190.0, ans=0.0 2024-08-19 05:42:14,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.707e+01 2.290e+01 2.531e+01 2.891e+01 5.693e+01, threshold=5.062e+01, percent-clipped=1.0 2024-08-19 05:42:23,830 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4310290.0, ans=0.025 2024-08-19 05:42:24,585 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 18 from Vox, 23 fro AS 2024-08-19 05:42:30,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4310290.0, ans=0.1 2024-08-19 05:42:30,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4310290.0, ans=0.125 2024-08-19 05:42:35,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4310390.0, ans=0.1 2024-08-19 05:42:39,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.99 vs. limit=22.5 2024-08-19 05:42:45,443 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 950, loss[loss=0.1142, beats_loss=0.009396, ecapa_loss=0.0001039, whisper_loss=0.1038, over 18188.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01038, ecapa_loss=0.0001399, whisper_loss=0.08828, over 3747396.80 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:42:54,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4310490.0, ans=0.5 2024-08-19 05:43:01,026 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:43:01,034 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4310590.0, ans=0.0 2024-08-19 05:43:06,478 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4310590.0, ans=0.1 2024-08-19 05:43:29,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4310790.0, ans=0.125 2024-08-19 05:43:30,164 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 05:43:31,779 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 05:43:36,746 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 05:43:37,544 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2024-08-19 05:43:40,343 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 05:43:40,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4310890.0, ans=0.1 2024-08-19 05:43:46,364 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 05:43:49,830 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1000, loss[loss=0.07477, beats_loss=0.01239, ecapa_loss=0.0001308, whisper_loss=0.06107, over 17042.00 frames. ], tot_loss[loss=0.09997, beats_loss=0.0104, ecapa_loss=0.0001398, whisper_loss=0.08817, over 3765362.44 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:43:59,789 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 05:44:02,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4311090.0, ans=0.05 2024-08-19 05:44:22,965 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 05:44:24,335 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.215e+01 2.410e+01 2.644e+01 3.685e+01, threshold=4.819e+01, percent-clipped=0.0 2024-08-19 05:44:39,431 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=12.0 2024-08-19 05:44:41,943 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 18 from Vox, 48 fro AS 2024-08-19 05:44:46,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4311390.0, ans=0.04949747468305833 2024-08-19 05:44:50,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=8.0 2024-08-19 05:44:56,081 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1050, loss[loss=0.1188, beats_loss=0.01139, ecapa_loss=9.816e-05, whisper_loss=0.1064, over 18738.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01042, ecapa_loss=0.0001383, whisper_loss=0.08888, over 3812429.60 frames. ], batch size: 68, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:45:16,459 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 05:45:47,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4311890.0, ans=0.125 2024-08-19 05:45:50,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4311890.0, ans=0.2 2024-08-19 05:45:57,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2024-08-19 05:46:00,269 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=15.0 2024-08-19 05:46:01,919 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1100, loss[loss=0.1092, beats_loss=0.01024, ecapa_loss=0.0001571, whisper_loss=0.09735, over 19365.00 frames. ], tot_loss[loss=0.1001, beats_loss=0.01042, ecapa_loss=0.0001379, whisper_loss=0.08832, over 3809473.04 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:46:26,391 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 05:46:26,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4312090.0, ans=0.2 2024-08-19 05:46:26,972 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=12.0 2024-08-19 05:46:35,718 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 05:46:36,715 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.799e+01 2.284e+01 2.503e+01 2.808e+01 4.234e+01, threshold=5.006e+01, percent-clipped=0.0 2024-08-19 05:46:46,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4312290.0, ans=0.0 2024-08-19 05:46:50,080 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-19 05:46:56,621 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4312390.0, ans=0.125 2024-08-19 05:46:59,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4312390.0, ans=0.2 2024-08-19 05:46:59,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2024-08-19 05:47:10,042 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1150, loss[loss=0.1221, beats_loss=0.008831, ecapa_loss=0.0001577, whisper_loss=0.1117, over 15636.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01035, ecapa_loss=0.0001389, whisper_loss=0.0891, over 3825441.87 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:47:19,962 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 19 from LS+wenet, 16 from Vox, 18 fro AS 2024-08-19 05:47:21,509 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 05:47:26,768 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 05:47:28,363 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 05:47:37,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4312690.0, ans=0.125 2024-08-19 05:47:38,934 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 18 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 05:47:40,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4312690.0, ans=0.5 2024-08-19 05:47:42,790 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 13 from LS+wenet, 16 from Vox, 38 fro AS 2024-08-19 05:47:44,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4312690.0, ans=0.2 2024-08-19 05:48:06,169 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.27 vs. limit=22.5 2024-08-19 05:48:13,018 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 05:48:16,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4312890.0, ans=0.125 2024-08-19 05:48:21,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1200, loss[loss=0.06721, beats_loss=0.01082, ecapa_loss=0.0001348, whisper_loss=0.05504, over 16032.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01038, ecapa_loss=0.0001396, whisper_loss=0.08924, over 3796382.25 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:48:23,302 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 30 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 05:48:23,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2024-08-19 05:48:37,775 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 19 from Vox, 17 fro AS 2024-08-19 05:48:41,229 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 05:48:48,525 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 05:48:55,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4313190.0, ans=0.0 2024-08-19 05:48:59,729 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.227e+01 2.498e+01 2.672e+01 3.472e+01, threshold=4.995e+01, percent-clipped=0.0 2024-08-19 05:49:28,400 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 31 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 05:49:32,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-08-19 05:49:34,271 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1250, loss[loss=0.09757, beats_loss=0.008714, ecapa_loss=0.0001542, whisper_loss=0.08732, over 17409.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01039, ecapa_loss=0.0001398, whisper_loss=0.08971, over 3797914.76 frames. ], batch size: 69, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:49:35,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4313490.0, ans=0.1 2024-08-19 05:50:28,097 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 05:50:40,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4313890.0, ans=0.125 2024-08-19 05:50:47,974 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2024-08-19 05:50:48,235 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1300, loss[loss=0.09994, beats_loss=0.01082, ecapa_loss=0.0001318, whisper_loss=0.08781, over 18990.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08884, over 3770697.77 frames. ], batch size: 78, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:51:02,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4314090.0, ans=0.0 2024-08-19 05:51:11,154 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 05:51:12,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4314090.0, ans=0.125 2024-08-19 05:51:26,425 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.752e+01 2.267e+01 2.463e+01 2.675e+01 4.452e+01, threshold=4.926e+01, percent-clipped=0.0 2024-08-19 05:51:29,661 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 05:51:34,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4314290.0, ans=0.0 2024-08-19 05:51:36,781 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 05:51:39,527 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 05:51:41,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2024-08-19 05:51:54,460 WARNING [optim.py:496] (3/4) Scaling gradients by 0.009816886857151985, model_norm_threshold=49.264041900634766 2024-08-19 05:51:54,636 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.0.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.852e+06, grad_sumsq=3.852e+06, orig_rms_sq=1.000e+00 2024-08-19 05:51:55,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4314390.0, ans=0.1 2024-08-19 05:51:58,403 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.30 vs. limit=6.0 2024-08-19 05:52:02,067 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1350, loss[loss=0.1118, beats_loss=0.008434, ecapa_loss=0.000116, whisper_loss=0.1022, over 20255.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01038, ecapa_loss=0.0001391, whisper_loss=0.08959, over 3808911.11 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:52:07,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4314490.0, ans=0.0 2024-08-19 05:52:21,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-08-19 05:52:29,694 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 29 from Vox, 26 fro AS 2024-08-19 05:52:36,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4314690.0, ans=0.0 2024-08-19 05:52:42,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4314690.0, ans=0.125 2024-08-19 05:52:44,590 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 05:53:18,923 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1400, loss[loss=0.1134, beats_loss=0.008995, ecapa_loss=0.0001353, whisper_loss=0.103, over 19859.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01031, ecapa_loss=0.0001398, whisper_loss=0.09016, over 3800550.74 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:53:20,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4314990.0, ans=0.0 2024-08-19 05:53:38,442 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4315090.0, ans=0.125 2024-08-19 05:53:39,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4315090.0, ans=0.125 2024-08-19 05:53:42,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4315090.0, ans=0.0 2024-08-19 05:53:53,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4315190.0, ans=0.125 2024-08-19 05:53:57,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.218e+01 2.484e+01 2.837e+01 5.018e+03, threshold=4.968e+01, percent-clipped=1.0 2024-08-19 05:54:01,145 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4315190.0, ans=0.125 2024-08-19 05:54:04,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4315290.0, ans=0.125 2024-08-19 05:54:06,559 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 05:54:15,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4315290.0, ans=0.125 2024-08-19 05:54:53,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1450, loss[loss=0.08518, beats_loss=0.01246, ecapa_loss=0.0001321, whisper_loss=0.0714, over 14283.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01039, ecapa_loss=0.0001396, whisper_loss=0.08934, over 3791955.69 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:55:01,217 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 05:55:03,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4315490.0, ans=0.125 2024-08-19 05:55:03,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.99 vs. limit=12.0 2024-08-19 05:55:05,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4315490.0, ans=0.2 2024-08-19 05:55:18,011 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 25 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 05:55:21,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4315690.0, ans=0.0 2024-08-19 05:55:38,002 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=12.0 2024-08-19 05:56:03,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4315990.0, ans=0.125 2024-08-19 05:56:04,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1500, loss[loss=0.1061, beats_loss=0.01001, ecapa_loss=0.000139, whisper_loss=0.09468, over 20205.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01038, ecapa_loss=0.000139, whisper_loss=0.08898, over 3799338.49 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:56:05,764 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 05:56:09,635 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 05:56:36,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4316190.0, ans=0.125 2024-08-19 05:56:40,286 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.207e+01 2.443e+01 2.756e+01 5.889e+01, threshold=4.886e+01, percent-clipped=1.0 2024-08-19 05:56:42,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-19 05:56:49,207 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 24 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 05:56:55,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4316290.0, ans=0.125 2024-08-19 05:56:56,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4316290.0, ans=0.2 2024-08-19 05:57:14,341 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1550, loss[loss=0.1002, beats_loss=0.007743, ecapa_loss=0.0001473, whisper_loss=0.09098, over 15917.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.08981, over 3817112.65 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:57:25,475 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.88 vs. limit=10.0 2024-08-19 05:57:30,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4316590.0, ans=0.1 2024-08-19 05:57:35,214 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4316590.0, ans=0.125 2024-08-19 05:57:36,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4316590.0, ans=0.0 2024-08-19 05:57:51,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4316690.0, ans=0.125 2024-08-19 05:58:01,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4316790.0, ans=0.0 2024-08-19 05:58:04,905 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2024-08-19 05:58:09,450 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 19 from Vox, 19 fro AS 2024-08-19 05:58:22,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4316890.0, ans=0.125 2024-08-19 05:58:22,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.40 vs. limit=10.0 2024-08-19 05:58:24,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1600, loss[loss=0.06439, beats_loss=0.01223, ecapa_loss=0.0001264, whisper_loss=0.05089, over 16038.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01033, ecapa_loss=0.0001374, whisper_loss=0.09031, over 3838800.70 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:58:31,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4316990.0, ans=0.125 2024-08-19 05:58:40,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4317090.0, ans=0.125 2024-08-19 05:58:49,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4317090.0, ans=0.1 2024-08-19 05:58:54,782 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 05:58:58,952 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 05:58:59,943 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.861e+01 2.330e+01 2.560e+01 2.867e+01 4.282e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 05:59:05,687 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 05:59:06,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4317290.0, ans=0.125 2024-08-19 05:59:26,017 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 17 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 05:59:27,175 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 16 from Vox, 17 fro AS 2024-08-19 05:59:31,770 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1650, loss[loss=0.1102, beats_loss=0.009114, ecapa_loss=0.0001494, whisper_loss=0.09963, over 16354.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01036, ecapa_loss=0.0001381, whisper_loss=0.09041, over 3841897.94 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 05:59:43,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4317490.0, ans=0.2 2024-08-19 05:59:48,151 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-08-19 06:00:01,207 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:00:10,826 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4317790.0, ans=0.0 2024-08-19 06:00:17,335 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.39 vs. limit=15.0 2024-08-19 06:00:18,996 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 06:00:37,912 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1700, loss[loss=0.08713, beats_loss=0.01084, ecapa_loss=0.0001506, whisper_loss=0.07479, over 18083.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01032, ecapa_loss=0.0001386, whisper_loss=0.09018, over 3820221.86 frames. ], batch size: 76, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:00:51,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4318090.0, ans=0.125 2024-08-19 06:01:11,377 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.261e+01 2.522e+01 2.818e+01 7.809e+01, threshold=5.044e+01, percent-clipped=1.0 2024-08-19 06:01:21,114 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4318290.0, ans=0.125 2024-08-19 06:01:44,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4318490.0, ans=0.0 2024-08-19 06:01:45,066 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1750, loss[loss=0.1018, beats_loss=0.01028, ecapa_loss=0.0001298, whisper_loss=0.09027, over 23563.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01029, ecapa_loss=0.0001389, whisper_loss=0.0898, over 3806065.06 frames. ], batch size: 92, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:01:56,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4318490.0, ans=0.0 2024-08-19 06:02:27,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-19 06:02:45,061 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4318790.0, ans=0.125 2024-08-19 06:02:45,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4318790.0, ans=0.125 2024-08-19 06:02:54,675 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 06:02:56,262 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 22 from Vox, 28 fro AS 2024-08-19 06:02:58,452 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-08-19 06:03:03,157 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1800, loss[loss=0.1111, beats_loss=0.006419, ecapa_loss=0.0001719, whisper_loss=0.103, over 16872.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001384, whisper_loss=0.08973, over 3838009.17 frames. ], batch size: 65, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:03:20,056 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 13 from Vox, 32 fro AS 2024-08-19 06:03:34,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4319090.0, ans=0.125 2024-08-19 06:03:44,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4319190.0, ans=0.09899494936611666 2024-08-19 06:03:49,011 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.229e+01 2.414e+01 2.728e+01 1.792e+02, threshold=4.829e+01, percent-clipped=1.0 2024-08-19 06:03:54,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4319190.0, ans=0.0 2024-08-19 06:04:06,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4319290.0, ans=0.125 2024-08-19 06:04:06,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4319290.0, ans=0.05 2024-08-19 06:04:10,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4319290.0, ans=0.0 2024-08-19 06:04:15,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4319390.0, ans=0.0 2024-08-19 06:04:15,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4319390.0, ans=0.1 2024-08-19 06:04:18,634 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 06:04:35,213 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1850, loss[loss=0.1066, beats_loss=0.00945, ecapa_loss=0.0001232, whisper_loss=0.09595, over 15997.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0104, ecapa_loss=0.0001374, whisper_loss=0.0889, over 3812506.12 frames. ], batch size: 57, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:04:40,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4319490.0, ans=0.0 2024-08-19 06:04:45,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4319490.0, ans=0.125 2024-08-19 06:05:00,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2024-08-19 06:05:13,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=15.0 2024-08-19 06:05:14,884 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 06:05:18,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4319690.0, ans=0.5 2024-08-19 06:05:22,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4319690.0, ans=0.125 2024-08-19 06:05:45,595 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 16 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 06:06:01,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4319890.0, ans=0.125 2024-08-19 06:06:14,328 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1900, loss[loss=0.09902, beats_loss=0.008857, ecapa_loss=0.0001151, whisper_loss=0.08902, over 15481.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001365, whisper_loss=0.08891, over 3825685.81 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:06:22,963 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4319990.0, ans=0.1 2024-08-19 06:06:27,898 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4319990.0, ans=0.125 2024-08-19 06:06:32,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4319990.0, ans=0.125 2024-08-19 06:07:13,367 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.747e+01 2.248e+01 2.499e+01 2.694e+01 3.637e+01, threshold=4.999e+01, percent-clipped=0.0 2024-08-19 06:07:17,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4320190.0, ans=0.125 2024-08-19 06:07:33,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.77 vs. limit=15.0 2024-08-19 06:07:41,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4320290.0, ans=0.0 2024-08-19 06:08:09,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 1950, loss[loss=0.1147, beats_loss=0.01109, ecapa_loss=0.0001137, whisper_loss=0.1025, over 19273.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01045, ecapa_loss=0.0001367, whisper_loss=0.0889, over 3833458.82 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:08:21,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4320490.0, ans=0.0 2024-08-19 06:08:21,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.41 vs. limit=10.0 2024-08-19 06:08:25,495 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 06:08:34,196 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 06:08:43,038 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-19 06:09:14,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4320690.0, ans=0.0 2024-08-19 06:09:24,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4320790.0, ans=0.125 2024-08-19 06:09:37,308 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=4320790.0, ans=10.0 2024-08-19 06:09:44,164 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 9 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 06:09:51,949 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 06:09:53,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4320890.0, ans=0.125 2024-08-19 06:10:07,812 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2000, loss[loss=0.09396, beats_loss=0.01068, ecapa_loss=0.0001309, whisper_loss=0.08197, over 17111.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01042, ecapa_loss=0.0001367, whisper_loss=0.08925, over 3843590.10 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:10:47,938 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 18 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 06:11:04,805 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.942e+01 2.279e+01 2.537e+01 2.810e+01 5.508e+01, threshold=5.074e+01, percent-clipped=1.0 2024-08-19 06:11:40,486 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2050, loss[loss=0.08098, beats_loss=0.01246, ecapa_loss=0.0001368, whisper_loss=0.06716, over 15928.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.0001375, whisper_loss=0.08966, over 3833001.56 frames. ], batch size: 63, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:11:58,175 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4321590.0, ans=0.125 2024-08-19 06:12:01,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4321590.0, ans=0.04949747468305833 2024-08-19 06:12:08,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4321590.0, ans=0.125 2024-08-19 06:12:22,502 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 06:12:23,050 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.00 vs. limit=6.0 2024-08-19 06:12:28,181 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.89 vs. limit=8.0 2024-08-19 06:12:56,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4321990.0, ans=0.035 2024-08-19 06:12:57,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2100, loss[loss=0.1063, beats_loss=0.01063, ecapa_loss=0.0001513, whisper_loss=0.09411, over 17298.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01039, ecapa_loss=0.0001357, whisper_loss=0.08916, over 3825883.20 frames. ], batch size: 70, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:12:57,807 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 06:13:00,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4321990.0, ans=0.5 2024-08-19 06:13:05,898 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 13 from Vox, 31 fro AS 2024-08-19 06:13:21,946 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4322090.0, ans=0.0 2024-08-19 06:13:39,215 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.726e+01 2.264e+01 2.459e+01 2.750e+01 5.104e+01, threshold=4.918e+01, percent-clipped=1.0 2024-08-19 06:13:53,671 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 20 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 06:14:08,799 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4322390.0, ans=0.0 2024-08-19 06:14:17,392 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2150, loss[loss=0.1158, beats_loss=0.008508, ecapa_loss=0.0001537, whisper_loss=0.1057, over 20793.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0104, ecapa_loss=0.0001355, whisper_loss=0.08958, over 3825966.48 frames. ], batch size: 85, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:14:32,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4322590.0, ans=0.1 2024-08-19 06:14:35,562 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.67 vs. limit=15.0 2024-08-19 06:14:36,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4322590.0, ans=0.09899494936611666 2024-08-19 06:14:41,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4322590.0, ans=0.125 2024-08-19 06:14:48,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4322690.0, ans=0.04949747468305833 2024-08-19 06:14:53,456 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 06:14:56,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4322690.0, ans=0.125 2024-08-19 06:15:05,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4322790.0, ans=0.125 2024-08-19 06:15:14,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4322790.0, ans=0.0 2024-08-19 06:15:38,170 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2200, loss[loss=0.09954, beats_loss=0.01163, ecapa_loss=0.0001236, whisper_loss=0.08667, over 22999.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001356, whisper_loss=0.09018, over 3837174.28 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:15:42,762 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 06:16:08,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4323190.0, ans=0.125 2024-08-19 06:16:09,741 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.73 vs. limit=15.0 2024-08-19 06:16:17,794 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.245e+01 2.459e+01 2.668e+01 3.758e+01, threshold=4.917e+01, percent-clipped=0.0 2024-08-19 06:16:20,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4323190.0, ans=0.09899494936611666 2024-08-19 06:16:25,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4323290.0, ans=0.0 2024-08-19 06:16:44,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4323390.0, ans=0.2 2024-08-19 06:16:51,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4323390.0, ans=0.1 2024-08-19 06:16:53,215 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 19 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 06:16:55,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4323490.0, ans=0.125 2024-08-19 06:16:56,482 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2250, loss[loss=0.1192, beats_loss=0.009908, ecapa_loss=0.0001518, whisper_loss=0.1078, over 18702.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001377, whisper_loss=0.09098, over 3826087.27 frames. ], batch size: 76, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:17:04,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.25 vs. limit=6.0 2024-08-19 06:17:07,017 WARNING [optim.py:496] (3/4) Scaling gradients by 0.027778564020991325, model_norm_threshold=49.17220687866211 2024-08-19 06:17:07,181 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.0.norm.log_scale with proportion 0.24, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.536e+05, grad_sumsq=7.536e+05, orig_rms_sq=1.000e+00 2024-08-19 06:17:26,554 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4323690.0, ans=0.125 2024-08-19 06:17:48,690 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 12 from Vox, 26 fro AS 2024-08-19 06:18:05,204 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 06:18:08,067 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 06:18:12,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=4323890.0, ans=15.0 2024-08-19 06:18:14,400 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2300, loss[loss=0.1017, beats_loss=0.01057, ecapa_loss=0.0001367, whisper_loss=0.08973, over 23242.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.0104, ecapa_loss=0.0001383, whisper_loss=0.09103, over 3858067.21 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 06:18:22,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4323990.0, ans=0.125 2024-08-19 06:18:31,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4324090.0, ans=0.1 2024-08-19 06:18:33,910 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 06:18:40,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4324090.0, ans=0.125 2024-08-19 06:18:44,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4324190.0, ans=0.125 2024-08-19 06:18:44,956 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:18:53,068 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.839e+01 2.282e+01 2.493e+01 2.821e+01 1.770e+03, threshold=4.986e+01, percent-clipped=1.0 2024-08-19 06:19:07,139 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 06:19:09,495 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:19:11,914 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 06:19:25,598 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4324390.0, ans=0.1 2024-08-19 06:19:31,978 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2350, loss[loss=0.1068, beats_loss=0.01084, ecapa_loss=0.0001533, whisper_loss=0.0944, over 21861.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01038, ecapa_loss=0.0001386, whisper_loss=0.09168, over 3845726.89 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 06:19:40,390 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 06:20:26,082 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4324790.0, ans=0.1 2024-08-19 06:20:28,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4324790.0, ans=0.125 2024-08-19 06:20:30,293 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 06:20:37,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4324890.0, ans=0.0 2024-08-19 06:20:38,952 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=9.207e-02 2024-08-19 06:20:39,792 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 06:20:49,907 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2400, loss[loss=0.1122, beats_loss=0.008289, ecapa_loss=0.0001409, whisper_loss=0.1025, over 15678.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001389, whisper_loss=0.091, over 3853815.34 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:21:08,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4325090.0, ans=0.0 2024-08-19 06:21:13,360 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4325090.0, ans=0.2 2024-08-19 06:21:30,498 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.266e+01 2.549e+01 2.810e+01 4.372e+01, threshold=5.098e+01, percent-clipped=0.0 2024-08-19 06:21:58,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4325390.0, ans=0.05 2024-08-19 06:21:59,938 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4325390.0, ans=0.125 2024-08-19 06:22:06,951 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2450, loss[loss=0.06882, beats_loss=0.01133, ecapa_loss=0.0001588, whisper_loss=0.05591, over 12551.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01033, ecapa_loss=0.0001402, whisper_loss=0.09058, over 3873214.17 frames. ], batch size: 54, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:22:13,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4325490.0, ans=0.1 2024-08-19 06:22:13,631 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2024-08-19 06:22:18,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4325490.0, ans=0.125 2024-08-19 06:22:35,803 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4325590.0, ans=0.1 2024-08-19 06:22:50,795 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 06:23:01,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2024-08-19 06:23:01,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=4325790.0, ans=15.0 2024-08-19 06:23:24,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2500, loss[loss=0.09766, beats_loss=0.01043, ecapa_loss=0.0001263, whisper_loss=0.08596, over 15074.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001402, whisper_loss=0.09084, over 3851823.28 frames. ], batch size: 59, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:23:32,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4325990.0, ans=0.0 2024-08-19 06:24:05,721 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 06:24:06,659 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.795e+01 2.293e+01 2.574e+01 2.771e+01 4.931e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 06:24:30,399 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 06:24:38,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=4326390.0, ans=0.025 2024-08-19 06:24:45,746 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2550, loss[loss=0.1236, beats_loss=0.00785, ecapa_loss=0.0001517, whisper_loss=0.1142, over 20593.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01035, ecapa_loss=0.0001385, whisper_loss=0.09074, over 3861031.79 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:24:46,412 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 06:24:48,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4326490.0, ans=0.0 2024-08-19 06:25:07,338 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.98 vs. limit=15.0 2024-08-19 06:25:13,765 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 06:25:17,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4326690.0, ans=0.09899494936611666 2024-08-19 06:25:22,017 WARNING [optim.py:496] (3/4) Scaling gradients by 0.054495006799697876, model_norm_threshold=51.48568344116211 2024-08-19 06:25:22,183 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.10, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.141e+04, grad_sumsq=2.777e+04, orig_rms_sq=3.292e+00 2024-08-19 06:25:36,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4326790.0, ans=0.035 2024-08-19 06:25:37,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4326790.0, ans=0.1 2024-08-19 06:25:44,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4326790.0, ans=0.1 2024-08-19 06:25:46,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4326790.0, ans=0.0 2024-08-19 06:26:03,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4326990.0, ans=0.125 2024-08-19 06:26:04,348 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2600, loss[loss=0.08476, beats_loss=0.01058, ecapa_loss=9.479e-05, whisper_loss=0.07323, over 16166.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01031, ecapa_loss=0.0001397, whisper_loss=0.09068, over 3848266.76 frames. ], batch size: 60, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:26:11,496 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 19 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 06:26:19,895 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 18 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 06:26:21,373 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 06:26:23,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4327090.0, ans=0.125 2024-08-19 06:26:35,031 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 06:26:41,537 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 06:26:45,669 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.777e+01 2.338e+01 2.555e+01 2.845e+01 9.448e+02, threshold=5.110e+01, percent-clipped=2.0 2024-08-19 06:26:58,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4327290.0, ans=0.2 2024-08-19 06:26:58,761 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2024-08-19 06:27:01,389 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 06:27:23,435 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2650, loss[loss=0.1079, beats_loss=0.008717, ecapa_loss=0.0001869, whisper_loss=0.09736, over 17895.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001397, whisper_loss=0.09015, over 3848429.32 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:27:44,750 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 06:27:46,183 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 06:28:00,304 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 06:28:41,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2700, loss[loss=0.08179, beats_loss=0.01137, ecapa_loss=0.0001455, whisper_loss=0.06897, over 18976.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01044, ecapa_loss=0.0001395, whisper_loss=0.08929, over 3865568.45 frames. ], batch size: 79, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:29:05,841 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 06:29:07,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4328090.0, ans=0.125 2024-08-19 06:29:08,586 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 06:29:15,590 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 06:29:21,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4328190.0, ans=0.125 2024-08-19 06:29:24,271 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.748e+01 2.337e+01 2.538e+01 2.914e+01 3.709e+01, threshold=5.076e+01, percent-clipped=0.0 2024-08-19 06:29:25,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4328190.0, ans=0.0 2024-08-19 06:29:36,964 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-19 06:29:37,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4328290.0, ans=0.07 2024-08-19 06:29:42,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4328290.0, ans=0.0 2024-08-19 06:29:50,741 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 28 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 06:29:51,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-08-19 06:29:53,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2024-08-19 06:30:00,787 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2750, loss[loss=0.09223, beats_loss=0.01051, ecapa_loss=0.000126, whisper_loss=0.08046, over 17877.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01039, ecapa_loss=0.0001397, whisper_loss=0.09008, over 3863252.00 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:30:12,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4328490.0, ans=0.125 2024-08-19 06:30:20,100 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 33 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 06:30:26,771 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 37 from Vox, 33 fro AS 2024-08-19 06:30:56,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4328790.0, ans=0.0 2024-08-19 06:30:59,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4328790.0, ans=0.125 2024-08-19 06:31:11,605 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 06:31:20,972 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2800, loss[loss=0.09275, beats_loss=0.008179, ecapa_loss=0.0001944, whisper_loss=0.08263, over 16091.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001397, whisper_loss=0.08999, over 3857539.57 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:31:22,894 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 06:31:48,700 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 06:31:49,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4329090.0, ans=0.1 2024-08-19 06:31:57,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2024-08-19 06:32:00,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4329190.0, ans=0.0 2024-08-19 06:32:04,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.348e+01 2.575e+01 2.807e+01 4.733e+01, threshold=5.150e+01, percent-clipped=0.0 2024-08-19 06:32:13,420 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 06:32:19,131 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 06:32:39,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.26 vs. limit=15.0 2024-08-19 06:32:41,438 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2850, loss[loss=0.1174, beats_loss=0.009693, ecapa_loss=0.0001501, whisper_loss=0.1062, over 22150.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01049, ecapa_loss=0.0001394, whisper_loss=0.08978, over 3884267.11 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:33:37,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4329790.0, ans=0.125 2024-08-19 06:34:00,928 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 06:34:01,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4329990.0, ans=0.125 2024-08-19 06:34:01,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2900, loss[loss=0.1112, beats_loss=0.009308, ecapa_loss=0.0001457, whisper_loss=0.1004, over 16412.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001398, whisper_loss=0.08938, over 3876128.12 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:34:10,245 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 20 from LS+wenet, 34 from Vox, 39 fro AS 2024-08-19 06:34:17,007 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4330090.0, ans=0.125 2024-08-19 06:34:18,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4330090.0, ans=0.125 2024-08-19 06:34:20,025 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=12.0 2024-08-19 06:34:26,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4330090.0, ans=0.1 2024-08-19 06:34:27,744 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 22 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 06:34:29,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4330090.0, ans=0.0 2024-08-19 06:34:32,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=4330190.0, ans=12.0 2024-08-19 06:34:46,027 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.702e+01 2.333e+01 2.520e+01 2.841e+01 3.701e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 06:34:46,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4330190.0, ans=0.0 2024-08-19 06:34:48,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4330290.0, ans=0.125 2024-08-19 06:34:55,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4330290.0, ans=0.125 2024-08-19 06:35:02,663 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 06:35:08,655 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 19 from Vox, 49 fro AS 2024-08-19 06:35:16,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4330390.0, ans=0.125 2024-08-19 06:35:20,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4330490.0, ans=0.2 2024-08-19 06:35:21,640 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 2950, loss[loss=0.08468, beats_loss=0.008691, ecapa_loss=0.0002237, whisper_loss=0.07375, over 17370.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001405, whisper_loss=0.08929, over 3874224.62 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:35:25,547 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4330490.0, ans=0.1 2024-08-19 06:35:58,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4330690.0, ans=0.125 2024-08-19 06:36:08,343 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 19 from Vox, 53 fro AS 2024-08-19 06:36:11,806 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-08-19 06:36:14,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4330790.0, ans=0.0 2024-08-19 06:36:17,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4330790.0, ans=0.2 2024-08-19 06:36:18,341 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 06:36:31,302 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 19 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 06:36:43,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4330990.0, ans=0.125 2024-08-19 06:36:44,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3000, loss[loss=0.1166, beats_loss=0.01121, ecapa_loss=0.0001184, whisper_loss=0.1042, over 22599.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001412, whisper_loss=0.08988, over 3901379.26 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:36:44,149 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 06:37:20,894 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on ASR_libri: loss=0.2537, beats_loss=0, ecapa_loss=0.0005147, whisper_loss=0.2486, over 922467.00 frames. 2024-08-19 06:37:39,390 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on SV_voxceleb1: loss=0.003985, beats_loss=0, ecapa_loss=0.0003985, whisper_loss=0, over 939242.00 frames. 2024-08-19 06:39:27,123 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on AT_audioset: loss=0.02305, beats_loss=0.02305, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 06:39:27,127 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 06:39:48,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4331090.0, ans=0.125 2024-08-19 06:40:00,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4331190.0, ans=0.125 2024-08-19 06:40:11,476 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.401e+01 2.628e+01 2.942e+01 3.563e+02, threshold=5.255e+01, percent-clipped=2.0 2024-08-19 06:40:17,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4331290.0, ans=0.125 2024-08-19 06:40:21,608 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 06:40:33,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4331390.0, ans=0.0 2024-08-19 06:40:39,583 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.231e+05 2024-08-19 06:40:47,477 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05165092274546623, model_norm_threshold=52.55119705200195 2024-08-19 06:40:47,639 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.383e+05, grad_sumsq=1.383e+05, orig_rms_sq=1.000e+00 2024-08-19 06:40:53,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3050, loss[loss=0.09887, beats_loss=0.01019, ecapa_loss=0.0001597, whisper_loss=0.08708, over 19150.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001414, whisper_loss=0.09068, over 3944371.89 frames. ], batch size: 76, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:41:01,094 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 06:41:25,142 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 06:41:34,090 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 30 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 06:41:42,857 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 06:41:50,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4331790.0, ans=0.2 2024-08-19 06:41:52,266 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-08-19 06:42:01,259 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2024-08-19 06:42:17,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3100, loss[loss=0.1185, beats_loss=0.01006, ecapa_loss=0.0001511, whisper_loss=0.1069, over 17609.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01043, ecapa_loss=0.0001424, whisper_loss=0.0909, over 3895474.15 frames. ], batch size: 73, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:42:19,493 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 18 from LS+wenet, 15 from Vox, 22 fro AS 2024-08-19 06:42:20,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2024-08-19 06:42:28,797 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4331990.0, ans=0.09899494936611666 2024-08-19 06:42:30,054 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 06:42:48,336 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 20 from Vox, 41 fro AS 2024-08-19 06:42:48,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4332090.0, ans=0.125 2024-08-19 06:42:59,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4332190.0, ans=0.0 2024-08-19 06:43:01,791 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.860e+01 2.302e+01 2.496e+01 2.802e+01 1.017e+03, threshold=4.993e+01, percent-clipped=2.0 2024-08-19 06:43:02,392 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 14 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 06:43:26,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4332390.0, ans=0.125 2024-08-19 06:43:27,528 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 06:43:34,450 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.13 vs. limit=15.0 2024-08-19 06:43:35,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4332390.0, ans=0.1 2024-08-19 06:43:39,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4332490.0, ans=0.125 2024-08-19 06:43:40,237 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3150, loss[loss=0.1126, beats_loss=0.01041, ecapa_loss=0.000146, whisper_loss=0.1007, over 20399.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09085, over 3887215.88 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:43:52,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4332490.0, ans=0.125 2024-08-19 06:44:09,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4332590.0, ans=0.1 2024-08-19 06:44:11,176 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.42 vs. limit=22.5 2024-08-19 06:44:17,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4332690.0, ans=0.0 2024-08-19 06:44:21,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4332690.0, ans=10.0 2024-08-19 06:44:28,082 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 06:44:33,210 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4332790.0, ans=0.2 2024-08-19 06:44:34,572 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4332790.0, ans=0.0 2024-08-19 06:44:48,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4332890.0, ans=0.125 2024-08-19 06:44:52,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4332890.0, ans=0.2 2024-08-19 06:45:00,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3200, loss[loss=0.1191, beats_loss=0.01041, ecapa_loss=0.0001717, whisper_loss=0.1069, over 18561.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001418, whisper_loss=0.09125, over 3898621.63 frames. ], batch size: 76, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:45:03,296 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 06:45:06,132 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 06:45:39,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4333190.0, ans=0.2 2024-08-19 06:45:42,804 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.260e+01 2.450e+01 2.731e+01 1.494e+02, threshold=4.900e+01, percent-clipped=1.0 2024-08-19 06:45:51,513 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-08-19 06:45:57,243 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4333290.0, ans=0.125 2024-08-19 06:46:11,115 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.91 vs. limit=15.0 2024-08-19 06:46:12,384 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4333390.0, ans=0.125 2024-08-19 06:46:19,768 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3250, loss[loss=0.09073, beats_loss=0.01039, ecapa_loss=0.0001342, whisper_loss=0.079, over 23627.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.09123, over 3901513.27 frames. ], batch size: 94, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:46:41,445 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 06:46:50,193 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 20 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-19 06:46:59,350 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4333690.0, ans=0.125 2024-08-19 06:47:02,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4333690.0, ans=0.1 2024-08-19 06:47:06,115 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 06:47:14,849 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 06:47:26,948 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4333890.0, ans=0.0 2024-08-19 06:47:34,339 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-08-19 06:47:37,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3300, loss[loss=0.09251, beats_loss=0.01246, ecapa_loss=0.0001133, whisper_loss=0.07892, over 22483.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001426, whisper_loss=0.09095, over 3888280.39 frames. ], batch size: 91, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:47:53,447 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4334090.0, ans=0.125 2024-08-19 06:47:53,847 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2024-08-19 06:48:02,181 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 06:48:14,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4334190.0, ans=0.1 2024-08-19 06:48:18,348 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.868e+01 2.382e+01 2.607e+01 2.876e+01 9.685e+01, threshold=5.214e+01, percent-clipped=1.0 2024-08-19 06:48:21,864 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4334290.0, ans=0.125 2024-08-19 06:48:23,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4334290.0, ans=0.0 2024-08-19 06:48:40,067 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4334390.0, ans=0.0 2024-08-19 06:48:40,505 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2024-08-19 06:48:42,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4334390.0, ans=0.125 2024-08-19 06:48:52,873 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3350, loss[loss=0.1026, beats_loss=0.01243, ecapa_loss=0.000115, whisper_loss=0.08902, over 21701.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01039, ecapa_loss=0.0001428, whisper_loss=0.09121, over 3875951.25 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:49:14,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4334590.0, ans=0.1 2024-08-19 06:49:20,372 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 26 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-19 06:49:21,035 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.81 vs. limit=12.0 2024-08-19 06:49:24,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4334690.0, ans=0.1 2024-08-19 06:49:50,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4334890.0, ans=0.125 2024-08-19 06:49:53,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4334890.0, ans=0.125 2024-08-19 06:50:02,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4334890.0, ans=0.2 2024-08-19 06:50:05,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3400, loss[loss=0.08925, beats_loss=0.01102, ecapa_loss=0.000147, whisper_loss=0.07676, over 20928.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001411, whisper_loss=0.09037, over 3898190.84 frames. ], batch size: 88, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:50:10,520 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 06:50:10,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4334990.0, ans=0.2 2024-08-19 06:50:41,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4335190.0, ans=0.07 2024-08-19 06:50:43,282 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.739e+01 2.221e+01 2.426e+01 2.713e+01 1.019e+02, threshold=4.853e+01, percent-clipped=2.0 2024-08-19 06:50:50,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4335290.0, ans=0.125 2024-08-19 06:50:52,823 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 06:50:54,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4335290.0, ans=0.0 2024-08-19 06:51:15,621 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3450, loss[loss=0.1071, beats_loss=0.01196, ecapa_loss=0.0001473, whisper_loss=0.09363, over 21639.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01054, ecapa_loss=0.0001408, whisper_loss=0.08978, over 3906797.05 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:51:16,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4335490.0, ans=0.1 2024-08-19 06:51:24,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4335490.0, ans=0.125 2024-08-19 06:51:26,701 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 06:51:27,906 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 06:51:40,015 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 25 from LS+wenet, 20 from Vox, 14 fro AS 2024-08-19 06:51:42,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4335590.0, ans=0.125 2024-08-19 06:51:51,796 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4335690.0, ans=0.07 2024-08-19 06:51:55,787 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 10 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 06:51:57,390 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.747e+05 2024-08-19 06:52:02,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4335790.0, ans=0.125 2024-08-19 06:52:12,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-19 06:52:20,761 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 06:52:22,934 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3500, loss[loss=0.1011, beats_loss=0.0111, ecapa_loss=0.0001537, whisper_loss=0.0885, over 22895.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001413, whisper_loss=0.08966, over 3882937.36 frames. ], batch size: 93, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:52:23,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4335990.0, ans=0.0 2024-08-19 06:52:57,037 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.291e+01 2.555e+01 2.847e+01 3.911e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-19 06:52:57,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4336190.0, ans=0.125 2024-08-19 06:53:13,482 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 06:53:19,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4336390.0, ans=0.125 2024-08-19 06:53:19,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4336390.0, ans=0.125 2024-08-19 06:53:23,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4336390.0, ans=0.0 2024-08-19 06:53:25,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3550, loss[loss=0.1004, beats_loss=0.01028, ecapa_loss=0.0001314, whisper_loss=0.08883, over 22847.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001404, whisper_loss=0.08919, over 3887652.52 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:53:54,949 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 34 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 06:53:55,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4336690.0, ans=0.1 2024-08-19 06:53:57,594 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4336690.0, ans=0.125 2024-08-19 06:54:01,914 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 06:54:13,805 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.35 vs. limit=10.0 2024-08-19 06:54:27,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3600, loss[loss=0.1006, beats_loss=0.009632, ecapa_loss=0.0001231, whisper_loss=0.08972, over 16111.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01045, ecapa_loss=0.0001407, whisper_loss=0.08977, over 3883530.47 frames. ], batch size: 62, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:54:29,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4336990.0, ans=0.2 2024-08-19 06:55:00,628 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.896e+01 2.265e+01 2.489e+01 2.858e+01 3.762e+01, threshold=4.979e+01, percent-clipped=0.0 2024-08-19 06:55:11,912 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 41 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-19 06:55:18,170 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 33 from Vox, 33 fro AS 2024-08-19 06:55:22,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4337390.0, ans=0.125 2024-08-19 06:55:24,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4337390.0, ans=0.1 2024-08-19 06:55:29,295 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3650, loss[loss=0.09806, beats_loss=0.01133, ecapa_loss=0.0001406, whisper_loss=0.08533, over 21600.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.09008, over 3901274.20 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:55:37,738 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 19 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 06:55:40,176 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09651452302932739, model_norm_threshold=49.788700103759766 2024-08-19 06:55:40,338 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.29, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.699e+04, grad_sumsq=7.699e+04, orig_rms_sq=1.000e+00 2024-08-19 06:55:48,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4337590.0, ans=0.0 2024-08-19 06:56:12,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2024-08-19 06:56:13,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=4337790.0, ans=0.025 2024-08-19 06:56:16,325 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4337790.0, ans=0.125 2024-08-19 06:56:19,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4337890.0, ans=0.0 2024-08-19 06:56:23,009 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4337890.0, ans=0.1 2024-08-19 06:56:31,131 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 15 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 06:56:32,219 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3700, loss[loss=0.07708, beats_loss=0.01192, ecapa_loss=0.0001151, whisper_loss=0.06401, over 18127.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01048, ecapa_loss=0.0001413, whisper_loss=0.08927, over 3858669.42 frames. ], batch size: 71, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:56:43,621 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 06:56:54,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4338090.0, ans=0.2 2024-08-19 06:57:05,626 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.332e+01 2.586e+01 3.000e+01 5.159e+02, threshold=5.172e+01, percent-clipped=5.0 2024-08-19 06:57:06,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4338190.0, ans=0.125 2024-08-19 06:57:07,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4338190.0, ans=0.125 2024-08-19 06:57:07,324 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.470e-01 2024-08-19 06:57:14,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4338290.0, ans=0.2 2024-08-19 06:57:16,871 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 17 from LS+wenet, 12 from Vox, 25 fro AS 2024-08-19 06:57:17,178 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4338290.0, ans=0.125 2024-08-19 06:57:22,002 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 06:57:26,946 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 06:57:34,160 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3750, loss[loss=0.1006, beats_loss=0.00973, ecapa_loss=0.0001459, whisper_loss=0.08942, over 17951.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.08942, over 3849645.49 frames. ], batch size: 72, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:57:42,035 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 06:57:52,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4338590.0, ans=0.1 2024-08-19 06:57:54,101 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4338590.0, ans=0.1 2024-08-19 06:57:56,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4338590.0, ans=0.125 2024-08-19 06:58:00,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4338690.0, ans=0.04949747468305833 2024-08-19 06:58:10,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4338790.0, ans=0.125 2024-08-19 06:58:16,398 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 14 from Vox, 39 fro AS 2024-08-19 06:58:20,438 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 19 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 06:58:21,569 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 06:58:22,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2024-08-19 06:58:32,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4338890.0, ans=0.09899494936611666 2024-08-19 06:58:35,918 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3800, loss[loss=0.1023, beats_loss=0.01061, ecapa_loss=0.000138, whisper_loss=0.09031, over 22434.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01062, ecapa_loss=0.0001396, whisper_loss=0.08913, over 3868361.34 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:58:41,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4338990.0, ans=0.125 2024-08-19 06:58:41,681 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.45 vs. limit=15.0 2024-08-19 06:58:48,276 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 06:58:58,964 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 06:58:59,775 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 27 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 06:59:06,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4339190.0, ans=0.2 2024-08-19 06:59:09,297 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.974e+01 2.289e+01 2.536e+01 2.898e+01 5.473e+01, threshold=5.073e+01, percent-clipped=1.0 2024-08-19 06:59:22,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.40 vs. limit=15.0 2024-08-19 06:59:26,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4339390.0, ans=0.09899494936611666 2024-08-19 06:59:37,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3850, loss[loss=0.09691, beats_loss=0.01073, ecapa_loss=0.0001249, whisper_loss=0.08494, over 15623.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01054, ecapa_loss=0.0001394, whisper_loss=0.08997, over 3865152.06 frames. ], batch size: 61, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 06:59:44,130 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 06:59:48,981 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 22 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 06:59:55,551 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4339590.0, ans=0.125 2024-08-19 06:59:56,427 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 06:59:59,160 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4339590.0, ans=0.04949747468305833 2024-08-19 07:00:09,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4339690.0, ans=0.1 2024-08-19 07:00:21,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4339790.0, ans=0.125 2024-08-19 07:00:25,769 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 07:00:32,331 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 07:00:33,714 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4339890.0, ans=0.125 2024-08-19 07:00:39,316 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3900, loss[loss=0.1098, beats_loss=0.01116, ecapa_loss=0.0001656, whisper_loss=0.09697, over 21695.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0105, ecapa_loss=0.0001402, whisper_loss=0.09073, over 3861366.75 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:00:41,911 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-19 07:00:44,945 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4339990.0, ans=0.0 2024-08-19 07:01:07,097 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4340190.0, ans=0.125 2024-08-19 07:01:08,118 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 17 from LS+wenet, 27 from Vox, 21 fro AS 2024-08-19 07:01:09,274 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 07:01:13,021 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.932e+01 2.284e+01 2.480e+01 2.767e+01 3.650e+01, threshold=4.959e+01, percent-clipped=0.0 2024-08-19 07:01:39,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4340390.0, ans=0.0 2024-08-19 07:01:41,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 3950, loss[loss=0.117, beats_loss=0.0102, ecapa_loss=0.0001589, whisper_loss=0.1053, over 16186.00 frames. ], tot_loss[loss=0.1037, beats_loss=0.01039, ecapa_loss=0.0001416, whisper_loss=0.09187, over 3888581.48 frames. ], batch size: 67, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:01:42,172 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.01 vs. limit=15.0 2024-08-19 07:01:43,883 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08397600054740906, model_norm_threshold=49.59146499633789 2024-08-19 07:01:44,046 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.26, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=9.206e+04, grad_sumsq=8.842e+06, orig_rms_sq=1.041e-02 2024-08-19 07:01:59,542 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.56 vs. limit=22.5 2024-08-19 07:02:00,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4340590.0, ans=0.2 2024-08-19 07:02:15,284 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2024-08-19 07:02:17,401 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 07:02:41,442 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 07:02:43,743 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4000, loss[loss=0.12, beats_loss=0.00933, ecapa_loss=0.0001609, whisper_loss=0.1091, over 19010.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001416, whisper_loss=0.09138, over 3896380.46 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:02:47,696 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 07:02:50,204 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 07:02:54,348 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.88 vs. limit=22.5 2024-08-19 07:03:03,907 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 17 from Vox, 21 fro AS 2024-08-19 07:03:17,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.880e+01 2.399e+01 2.689e+01 3.054e+01 5.905e+02, threshold=5.377e+01, percent-clipped=2.0 2024-08-19 07:03:46,331 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4050, loss[loss=0.09747, beats_loss=0.01168, ecapa_loss=0.0001247, whisper_loss=0.08454, over 19547.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01043, ecapa_loss=0.0001424, whisper_loss=0.09091, over 3871334.46 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:03:58,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4341590.0, ans=0.125 2024-08-19 07:04:00,340 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 07:04:06,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4341590.0, ans=0.125 2024-08-19 07:04:13,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4341690.0, ans=0.0 2024-08-19 07:04:18,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2024-08-19 07:04:38,359 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-08-19 07:04:48,534 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4100, loss[loss=0.08084, beats_loss=0.01075, ecapa_loss=0.0001145, whisper_loss=0.06895, over 21571.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.09089, over 3879035.73 frames. ], batch size: 81, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:05:06,075 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 17 from Vox, 45 fro AS 2024-08-19 07:05:09,314 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-08-19 07:05:16,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4342190.0, ans=0.125 2024-08-19 07:05:20,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=4342190.0, ans=0.2 2024-08-19 07:05:21,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.245e+01 2.633e+01 2.897e+01 5.694e+01, threshold=5.267e+01, percent-clipped=1.0 2024-08-19 07:05:36,766 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4342290.0, ans=0.125 2024-08-19 07:05:39,988 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-19 07:05:41,831 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2024-08-19 07:05:42,646 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4342390.0, ans=0.0 2024-08-19 07:05:46,538 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4342390.0, ans=0.125 2024-08-19 07:05:51,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4150, loss[loss=0.1069, beats_loss=0.009189, ecapa_loss=0.0001534, whisper_loss=0.09621, over 21614.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01051, ecapa_loss=0.0001408, whisper_loss=0.09043, over 3887483.00 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:05:57,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4342490.0, ans=0.0 2024-08-19 07:06:04,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2024-08-19 07:06:06,782 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 07:06:18,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.43 vs. limit=15.0 2024-08-19 07:06:21,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4342690.0, ans=0.1 2024-08-19 07:06:45,534 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 24 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 07:06:47,872 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.27 vs. limit=22.5 2024-08-19 07:06:54,599 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4200, loss[loss=0.09318, beats_loss=0.009914, ecapa_loss=0.000128, whisper_loss=0.08199, over 18182.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001406, whisper_loss=0.09056, over 3869280.29 frames. ], batch size: 69, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:07:03,563 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 14 from LS+wenet, 17 from Vox, 39 fro AS 2024-08-19 07:07:04,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4342990.0, ans=0.125 2024-08-19 07:07:05,241 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-08-19 07:07:11,276 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 16 from Vox, 43 fro AS 2024-08-19 07:07:27,342 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 30 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 07:07:28,653 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.261e+01 2.591e+01 2.854e+01 3.492e+01, threshold=5.183e+01, percent-clipped=0.0 2024-08-19 07:07:31,216 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 37 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 07:07:36,228 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 25 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 07:07:41,656 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4343290.0, ans=0.125 2024-08-19 07:07:46,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4343390.0, ans=0.1 2024-08-19 07:07:48,245 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4343390.0, ans=0.1 2024-08-19 07:07:57,617 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4250, loss[loss=0.1071, beats_loss=0.00985, ecapa_loss=0.0001496, whisper_loss=0.09576, over 22216.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001402, whisper_loss=0.09079, over 3875479.86 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:07:57,750 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 23 from Vox, 40 fro AS 2024-08-19 07:07:58,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4343490.0, ans=0.0 2024-08-19 07:08:04,715 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.15 vs. limit=22.5 2024-08-19 07:08:10,193 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 29 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 07:08:28,318 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.44 vs. limit=10.0 2024-08-19 07:08:38,548 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 07:08:46,594 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2024-08-19 07:08:50,389 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-08-19 07:08:59,406 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4300, loss[loss=0.0982, beats_loss=0.01205, ecapa_loss=0.0001386, whisper_loss=0.08477, over 13513.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01053, ecapa_loss=0.0001399, whisper_loss=0.09066, over 3876006.36 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:09:19,950 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4344090.0, ans=0.125 2024-08-19 07:09:21,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4344090.0, ans=0.025 2024-08-19 07:09:25,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4344190.0, ans=0.125 2024-08-19 07:09:27,605 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 07:09:30,857 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.98 vs. limit=10.0 2024-08-19 07:09:33,362 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.222e+01 2.440e+01 2.788e+01 3.909e+01, threshold=4.880e+01, percent-clipped=0.0 2024-08-19 07:09:33,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4344190.0, ans=0.125 2024-08-19 07:09:34,739 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 07:09:37,089 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 22 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 07:09:53,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=4344390.0, ans=0.5 2024-08-19 07:09:57,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4344390.0, ans=0.125 2024-08-19 07:10:01,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4350, loss[loss=0.1034, beats_loss=0.01038, ecapa_loss=0.0001312, whisper_loss=0.0917, over 19775.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09088, over 3867735.23 frames. ], batch size: 76, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:10:03,445 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 07:10:05,765 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 26 from Vox, 28 fro AS 2024-08-19 07:10:14,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4344590.0, ans=0.125 2024-08-19 07:10:22,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4344590.0, ans=0.125 2024-08-19 07:10:27,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4344690.0, ans=0.125 2024-08-19 07:10:37,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4344690.0, ans=0.2 2024-08-19 07:11:00,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4344890.0, ans=0.0 2024-08-19 07:11:05,010 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4400, loss[loss=0.09545, beats_loss=0.0109, ecapa_loss=9.48e-05, whisper_loss=0.08361, over 23140.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01037, ecapa_loss=0.0001398, whisper_loss=0.0908, over 3894952.68 frames. ], batch size: 87, lr: 2.05e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:11:09,906 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 07:11:13,541 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 32 from Vox, 37 fro AS 2024-08-19 07:11:18,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4345090.0, ans=0.1 2024-08-19 07:11:20,942 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 07:11:28,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4345190.0, ans=0.0 2024-08-19 07:11:34,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4345190.0, ans=0.125 2024-08-19 07:11:34,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4345190.0, ans=0.2 2024-08-19 07:11:39,288 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.731e+01 2.205e+01 2.466e+01 2.774e+01 4.446e+01, threshold=4.932e+01, percent-clipped=0.0 2024-08-19 07:11:50,974 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4345290.0, ans=0.125 2024-08-19 07:11:52,314 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4345290.0, ans=0.2 2024-08-19 07:11:58,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4345390.0, ans=10.0 2024-08-19 07:12:04,549 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4345390.0, ans=0.125 2024-08-19 07:12:06,639 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4450, loss[loss=0.07957, beats_loss=0.01299, ecapa_loss=0.0001029, whisper_loss=0.06555, over 23442.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.0001401, whisper_loss=0.09039, over 3896615.69 frames. ], batch size: 94, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:12:18,986 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.78 vs. limit=15.0 2024-08-19 07:12:22,515 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4345590.0, ans=0.1 2024-08-19 07:12:28,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4345590.0, ans=0.125 2024-08-19 07:12:32,305 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 29 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-19 07:12:34,009 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-08-19 07:12:48,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4345790.0, ans=0.125 2024-08-19 07:12:57,270 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 07:13:03,845 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=1.571e-01 2024-08-19 07:13:09,845 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4500, loss[loss=0.1316, beats_loss=0.008948, ecapa_loss=0.0001435, whisper_loss=0.1212, over 23161.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01042, ecapa_loss=0.0001398, whisper_loss=0.09031, over 3883425.43 frames. ], batch size: 89, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:13:12,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4345990.0, ans=0.1 2024-08-19 07:13:16,324 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 07:13:22,627 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 15 from Vox, 37 fro AS 2024-08-19 07:13:37,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4346190.0, ans=0.2 2024-08-19 07:13:39,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4346190.0, ans=0.1 2024-08-19 07:13:43,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4346190.0, ans=0.125 2024-08-19 07:13:45,333 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.773e+01 2.205e+01 2.468e+01 2.775e+01 4.472e+01, threshold=4.935e+01, percent-clipped=0.0 2024-08-19 07:13:51,004 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4346290.0, ans=10.0 2024-08-19 07:13:55,527 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 07:13:57,041 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 07:14:00,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4346390.0, ans=0.1 2024-08-19 07:14:03,082 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 07:14:13,006 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4550, loss[loss=0.1122, beats_loss=0.008853, ecapa_loss=0.0001538, whisper_loss=0.1018, over 20257.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01045, ecapa_loss=0.000141, whisper_loss=0.09036, over 3895064.88 frames. ], batch size: 80, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:14:50,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4346790.0, ans=0.125 2024-08-19 07:14:58,426 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.75 vs. limit=15.0 2024-08-19 07:14:59,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4346790.0, ans=0.0 2024-08-19 07:15:00,244 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 26 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 07:15:08,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4346890.0, ans=0.125 2024-08-19 07:15:13,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4346890.0, ans=0.125 2024-08-19 07:15:15,501 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4600, loss[loss=0.08639, beats_loss=0.00892, ecapa_loss=0.0001562, whisper_loss=0.07591, over 13698.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.0104, ecapa_loss=0.0001422, whisper_loss=0.09105, over 3916557.28 frames. ], batch size: 56, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:15:23,669 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-19 07:15:37,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4347090.0, ans=0.0 2024-08-19 07:15:45,521 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 21 from LS+wenet, 15 from Vox, 44 fro AS 2024-08-19 07:15:50,494 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.976e+01 2.323e+01 2.576e+01 2.927e+01 9.021e+01, threshold=5.152e+01, percent-clipped=3.0 2024-08-19 07:16:01,441 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 28 from Vox, 27 fro AS 2024-08-19 07:16:03,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-08-19 07:16:07,964 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 07:16:14,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.89 vs. limit=10.0 2024-08-19 07:16:15,769 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4347390.0, ans=0.09899494936611666 2024-08-19 07:16:17,704 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4650, loss[loss=0.1104, beats_loss=0.009997, ecapa_loss=0.000121, whisper_loss=0.09916, over 19985.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001423, whisper_loss=0.09057, over 3908807.39 frames. ], batch size: 77, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:16:33,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4347590.0, ans=10.0 2024-08-19 07:16:40,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4347590.0, ans=0.125 2024-08-19 07:16:51,584 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 07:16:57,475 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 11 from Vox, 23 fro AS 2024-08-19 07:17:05,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4347790.0, ans=0.125 2024-08-19 07:17:07,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4347890.0, ans=0.2 2024-08-19 07:17:16,288 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 07:17:17,503 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 17 from Vox, 33 fro AS 2024-08-19 07:17:19,636 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4700, loss[loss=0.1044, beats_loss=0.01, ecapa_loss=0.0001216, whisper_loss=0.0932, over 21231.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01046, ecapa_loss=0.000141, whisper_loss=0.09128, over 3909722.48 frames. ], batch size: 82, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:17:30,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4347990.0, ans=0.2 2024-08-19 07:17:38,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4348090.0, ans=0.0 2024-08-19 07:17:42,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.74 vs. limit=10.0 2024-08-19 07:17:44,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4348190.0, ans=0.2 2024-08-19 07:17:46,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=4348190.0, ans=0.0 2024-08-19 07:17:47,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4348190.0, ans=0.2 2024-08-19 07:17:54,431 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.829e+01 2.376e+01 2.578e+01 2.929e+01 1.160e+02, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 07:18:00,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4348290.0, ans=0.0 2024-08-19 07:18:10,233 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4348390.0, ans=0.125 2024-08-19 07:18:16,003 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 29 from Vox, 41 fro AS 2024-08-19 07:18:21,949 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4750, loss[loss=0.1272, beats_loss=0.01007, ecapa_loss=0.0001003, whisper_loss=0.1161, over 20327.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.09082, over 3911348.04 frames. ], batch size: 74, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:18:34,469 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-08-19 07:18:44,627 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 07:18:49,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4348690.0, ans=0.0 2024-08-19 07:19:10,730 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 32 from LS+wenet, 20 from Vox, 45 fro AS 2024-08-19 07:19:13,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4348890.0, ans=0.0 2024-08-19 07:19:16,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2024-08-19 07:19:17,138 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2024-08-19 07:19:23,689 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4800, loss[loss=0.09092, beats_loss=0.01019, ecapa_loss=0.0001523, whisper_loss=0.07921, over 21816.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01046, ecapa_loss=0.0001419, whisper_loss=0.09103, over 3920454.17 frames. ], batch size: 90, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:19:31,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4348990.0, ans=0.0 2024-08-19 07:19:34,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4348990.0, ans=0.0 2024-08-19 07:19:40,382 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 21 from Vox, 33 fro AS 2024-08-19 07:19:48,115 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4349190.0, ans=0.0 2024-08-19 07:19:58,675 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.754e+01 2.397e+01 2.595e+01 2.947e+01 3.968e+01, threshold=5.190e+01, percent-clipped=1.0 2024-08-19 07:20:00,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4349290.0, ans=0.125 2024-08-19 07:20:01,799 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=12.0 2024-08-19 07:20:14,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4349390.0, ans=0.0 2024-08-19 07:20:17,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4349390.0, ans=0.125 2024-08-19 07:20:26,499 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4850, loss[loss=0.1061, beats_loss=0.008782, ecapa_loss=0.0001626, whisper_loss=0.09567, over 20373.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.09089, over 3923446.24 frames. ], batch size: 84, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:20:27,855 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 21 from LS+wenet, 26 from Vox, 34 fro AS 2024-08-19 07:20:30,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4349490.0, ans=0.2 2024-08-19 07:20:34,679 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.30 vs. limit=12.0 2024-08-19 07:20:40,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4349590.0, ans=0.015 2024-08-19 07:20:42,176 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 07:20:47,890 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.62 vs. limit=15.0 2024-08-19 07:20:52,269 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 07:20:57,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4349690.0, ans=0.0 2024-08-19 07:21:01,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4349690.0, ans=0.125 2024-08-19 07:21:22,975 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 19 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 07:21:30,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4900, loss[loss=0.09906, beats_loss=0.00768, ecapa_loss=0.0001524, whisper_loss=0.08985, over 21491.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0105, ecapa_loss=0.0001422, whisper_loss=0.09057, over 3920252.32 frames. ], batch size: 85, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:21:38,610 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 20 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 07:21:41,098 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 07:21:49,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4350090.0, ans=0.125 2024-08-19 07:21:49,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4350090.0, ans=0.09899494936611666 2024-08-19 07:21:50,862 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 07:21:56,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4350190.0, ans=0.125 2024-08-19 07:21:59,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.60 vs. limit=15.0 2024-08-19 07:22:06,630 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.310e+01 2.480e+01 2.749e+01 3.874e+01, threshold=4.961e+01, percent-clipped=0.0 2024-08-19 07:22:08,025 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 29 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 07:22:16,093 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 07:22:24,849 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 07:22:35,134 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 4950, loss[loss=0.1072, beats_loss=0.008886, ecapa_loss=0.0001451, whisper_loss=0.09685, over 21295.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.0104, ecapa_loss=0.0001423, whisper_loss=0.09076, over 3872841.32 frames. ], batch size: 84, lr: 2.05e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:22:36,691 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 07:22:36,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4350490.0, ans=0.0 2024-08-19 07:22:45,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4350490.0, ans=10.0 2024-08-19 07:22:49,719 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 07:23:21,251 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 19 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 07:23:28,633 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 07:23:38,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4350890.0, ans=0.0 2024-08-19 07:23:41,461 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5000, loss[loss=0.1208, beats_loss=0.00828, ecapa_loss=0.0001407, whisper_loss=0.1111, over 17040.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001423, whisper_loss=0.09051, over 3855857.98 frames. ], batch size: 66, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:23:43,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-19 07:23:48,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4350990.0, ans=0.1 2024-08-19 07:23:49,429 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 07:23:56,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4351090.0, ans=0.1 2024-08-19 07:24:10,360 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 19 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 07:24:18,092 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.749e+01 2.255e+01 2.539e+01 2.731e+01 4.622e+01, threshold=5.077e+01, percent-clipped=0.0 2024-08-19 07:24:22,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4351290.0, ans=0.07 2024-08-19 07:24:31,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4351290.0, ans=0.1 2024-08-19 07:24:44,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4351390.0, ans=0.0 2024-08-19 07:24:48,471 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5050, loss[loss=0.1182, beats_loss=0.01127, ecapa_loss=9.752e-05, whisper_loss=0.106, over 16367.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01072, ecapa_loss=0.0001415, whisper_loss=0.09015, over 3853046.37 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:24:49,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4351490.0, ans=0.2 2024-08-19 07:24:52,335 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 07:25:03,974 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:25:10,158 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 07:25:14,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4351590.0, ans=0.125 2024-08-19 07:25:35,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-19 07:25:43,998 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2024-08-19 07:25:45,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4351890.0, ans=0.125 2024-08-19 07:25:51,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4351890.0, ans=0.1 2024-08-19 07:25:57,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5100, loss[loss=0.09777, beats_loss=0.01343, ecapa_loss=0.0001369, whisper_loss=0.08297, over 20020.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01073, ecapa_loss=0.00014, whisper_loss=0.09056, over 3885853.35 frames. ], batch size: 82, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:25:59,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4351990.0, ans=0.2 2024-08-19 07:26:06,299 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=4351990.0, ans=0.05 2024-08-19 07:26:12,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4352090.0, ans=0.125 2024-08-19 07:26:22,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4352190.0, ans=0.125 2024-08-19 07:26:27,348 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 07:26:33,203 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.991e+01 2.383e+01 2.590e+01 2.899e+01 2.370e+02, threshold=5.180e+01, percent-clipped=1.0 2024-08-19 07:26:37,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4352290.0, ans=0.2 2024-08-19 07:26:41,484 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4352290.0, ans=0.125 2024-08-19 07:26:47,431 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4352390.0, ans=0.0 2024-08-19 07:27:01,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5150, loss[loss=0.09697, beats_loss=0.01194, ecapa_loss=0.0001422, whisper_loss=0.08361, over 22000.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01073, ecapa_loss=0.0001407, whisper_loss=0.09021, over 3908787.26 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:27:15,605 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=22.5 2024-08-19 07:27:26,558 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2024-08-19 07:27:39,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4352790.0, ans=0.2 2024-08-19 07:27:42,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4352790.0, ans=0.05 2024-08-19 07:27:43,176 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 07:28:03,162 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5200, loss[loss=0.09717, beats_loss=0.0125, ecapa_loss=0.0001275, whisper_loss=0.08339, over 17442.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01067, ecapa_loss=0.0001406, whisper_loss=0.08973, over 3880650.63 frames. ], batch size: 70, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:28:03,947 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2024-08-19 07:28:04,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4352990.0, ans=0.0 2024-08-19 07:28:06,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4352990.0, ans=0.0 2024-08-19 07:28:22,168 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 14 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 07:28:38,980 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.425e+01 2.708e+01 3.005e+01 4.495e+01, threshold=5.416e+01, percent-clipped=0.0 2024-08-19 07:28:39,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4353190.0, ans=0.1 2024-08-19 07:28:40,381 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 35 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 07:28:45,181 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 07:28:56,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2024-08-19 07:28:58,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4353390.0, ans=0.125 2024-08-19 07:29:06,546 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5250, loss[loss=0.07325, beats_loss=0.01108, ecapa_loss=0.0001541, whisper_loss=0.06063, over 12809.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001405, whisper_loss=0.08992, over 3875371.15 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:29:17,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-08-19 07:29:24,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4353590.0, ans=0.0 2024-08-19 07:29:38,472 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4353690.0, ans=0.0 2024-08-19 07:29:42,293 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-19 07:29:46,543 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 07:29:47,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4353790.0, ans=0.0 2024-08-19 07:29:49,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4353790.0, ans=0.0 2024-08-19 07:29:50,404 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4353790.0, ans=0.125 2024-08-19 07:30:05,006 WARNING [optim.py:496] (3/4) Scaling gradients by 0.035648033022880554, model_norm_threshold=54.15937423706055 2024-08-19 07:30:05,168 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.11, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.460e+05, grad_sumsq=4.288e+05, orig_rms_sq=5.737e-01 2024-08-19 07:30:08,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5300, loss[loss=0.09601, beats_loss=0.01252, ecapa_loss=0.0001796, whisper_loss=0.0817, over 21166.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01063, ecapa_loss=0.0001415, whisper_loss=0.09015, over 3884416.30 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:30:43,236 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.902e+01 2.382e+01 2.697e+01 3.037e+01 1.519e+03, threshold=5.395e+01, percent-clipped=1.0 2024-08-19 07:30:43,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4354190.0, ans=0.125 2024-08-19 07:30:46,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4354290.0, ans=0.1 2024-08-19 07:30:48,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4354290.0, ans=0.125 2024-08-19 07:30:56,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.22 vs. limit=22.5 2024-08-19 07:31:06,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4354390.0, ans=0.0 2024-08-19 07:31:11,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5350, loss[loss=0.1174, beats_loss=0.00876, ecapa_loss=0.0001405, whisper_loss=0.1073, over 15362.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01058, ecapa_loss=0.0001403, whisper_loss=0.08975, over 3847238.47 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:31:34,763 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 07:31:36,242 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 13 from LS+wenet, 16 from Vox, 29 fro AS 2024-08-19 07:31:45,274 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 07:31:50,349 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 22 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 07:31:50,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4354790.0, ans=0.125 2024-08-19 07:31:58,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4354790.0, ans=0.2 2024-08-19 07:31:58,987 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 07:32:13,037 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 28 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 07:32:14,143 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5400, loss[loss=0.1203, beats_loss=0.007099, ecapa_loss=0.0001598, whisper_loss=0.1116, over 18092.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01048, ecapa_loss=0.0001405, whisper_loss=0.09011, over 3812085.16 frames. ], batch size: 70, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:32:14,320 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 07:32:15,493 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 07:32:25,407 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 30 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 07:32:30,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4355090.0, ans=0.5 2024-08-19 07:32:30,931 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4355090.0, ans=0.2 2024-08-19 07:32:39,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4355190.0, ans=0.0 2024-08-19 07:32:49,115 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.864e+01 2.281e+01 2.508e+01 2.776e+01 3.569e+02, threshold=5.016e+01, percent-clipped=2.0 2024-08-19 07:32:50,532 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 07:33:12,010 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4355390.0, ans=0.2 2024-08-19 07:33:16,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5450, loss[loss=0.1062, beats_loss=0.009766, ecapa_loss=0.0001521, whisper_loss=0.0949, over 21917.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01038, ecapa_loss=0.0001407, whisper_loss=0.09133, over 3803312.05 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:33:20,837 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=4355490.0, ans=0.05 2024-08-19 07:33:27,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=4355490.0, ans=0.1 2024-08-19 07:33:29,393 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 37 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 07:33:31,751 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 07:33:34,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4355590.0, ans=0.2 2024-08-19 07:33:39,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.64 vs. limit=15.0 2024-08-19 07:34:01,639 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 07:34:03,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4355790.0, ans=0.125 2024-08-19 07:34:10,321 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 07:34:19,218 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5500, loss[loss=0.09922, beats_loss=0.01075, ecapa_loss=0.0001422, whisper_loss=0.08705, over 20011.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01049, ecapa_loss=0.0001401, whisper_loss=0.09077, over 3834909.75 frames. ], batch size: 79, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:34:21,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4355990.0, ans=0.0 2024-08-19 07:34:27,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4355990.0, ans=0.0 2024-08-19 07:34:28,868 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.58 vs. limit=15.0 2024-08-19 07:34:31,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4356090.0, ans=0.2 2024-08-19 07:34:45,787 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 25 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 07:34:53,918 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.332e+01 2.514e+01 2.807e+01 3.996e+01, threshold=5.028e+01, percent-clipped=0.0 2024-08-19 07:34:58,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4356290.0, ans=0.125 2024-08-19 07:35:16,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4356390.0, ans=0.1 2024-08-19 07:35:21,822 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5550, loss[loss=0.117, beats_loss=0.007417, ecapa_loss=0.000183, whisper_loss=0.1078, over 21702.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0105, ecapa_loss=0.0001416, whisper_loss=0.09081, over 3842646.99 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:35:23,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4356490.0, ans=0.125 2024-08-19 07:35:40,866 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4356590.0, ans=10.0 2024-08-19 07:35:43,166 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:35:55,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4356690.0, ans=0.125 2024-08-19 07:36:20,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4356890.0, ans=0.125 2024-08-19 07:36:23,778 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5600, loss[loss=0.09113, beats_loss=0.01207, ecapa_loss=0.000104, whisper_loss=0.07801, over 22037.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01054, ecapa_loss=0.0001415, whisper_loss=0.0905, over 3854115.25 frames. ], batch size: 85, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:36:29,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4356990.0, ans=0.125 2024-08-19 07:36:32,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4356990.0, ans=0.0 2024-08-19 07:36:58,537 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.345e+01 2.547e+01 2.734e+01 3.198e+02, threshold=5.093e+01, percent-clipped=3.0 2024-08-19 07:37:03,571 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 07:37:03,889 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4357290.0, ans=0.0 2024-08-19 07:37:08,338 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 07:37:10,913 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 07:37:21,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4357390.0, ans=0.1 2024-08-19 07:37:23,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4357390.0, ans=0.125 2024-08-19 07:37:25,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5650, loss[loss=0.09989, beats_loss=0.01087, ecapa_loss=0.0001414, whisper_loss=0.0876, over 21375.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01057, ecapa_loss=0.0001419, whisper_loss=0.09013, over 3882126.26 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:37:28,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4357490.0, ans=0.0 2024-08-19 07:37:30,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.22 vs. limit=15.0 2024-08-19 07:37:31,757 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 07:37:32,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.51 vs. limit=15.0 2024-08-19 07:37:33,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4357490.0, ans=0.125 2024-08-19 07:37:34,078 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 37 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 07:38:06,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4357790.0, ans=0.125 2024-08-19 07:38:10,147 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 07:38:15,312 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4357890.0, ans=0.1 2024-08-19 07:38:19,689 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.45 vs. limit=22.5 2024-08-19 07:38:20,136 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 07:38:27,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5700, loss[loss=0.123, beats_loss=0.009312, ecapa_loss=0.0001408, whisper_loss=0.1123, over 18432.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001435, whisper_loss=0.09073, over 3891655.75 frames. ], batch size: 71, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:38:34,907 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 07:38:43,820 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 07:38:43,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4358090.0, ans=0.125 2024-08-19 07:38:57,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4358190.0, ans=0.2 2024-08-19 07:38:59,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.68 vs. limit=22.5 2024-08-19 07:39:00,281 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.694e-01 2024-08-19 07:39:02,166 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.849e+01 2.294e+01 2.534e+01 2.806e+01 5.396e+01, threshold=5.067e+01, percent-clipped=1.0 2024-08-19 07:39:12,531 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=2.702e-01 2024-08-19 07:39:21,243 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 32 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 07:39:22,330 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 07:39:25,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4358390.0, ans=0.125 2024-08-19 07:39:29,532 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5750, loss[loss=0.1165, beats_loss=0.007483, ecapa_loss=0.0001723, whisper_loss=0.1073, over 17513.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01036, ecapa_loss=0.0001438, whisper_loss=0.09144, over 3893400.86 frames. ], batch size: 71, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:39:46,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4358590.0, ans=0.09899494936611666 2024-08-19 07:40:20,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4358890.0, ans=0.2 2024-08-19 07:40:21,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4358890.0, ans=0.0 2024-08-19 07:40:24,433 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=15.0 2024-08-19 07:40:27,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4358890.0, ans=0.1 2024-08-19 07:40:32,179 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5800, loss[loss=0.1143, beats_loss=0.008744, ecapa_loss=0.0001304, whisper_loss=0.1042, over 22486.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001452, whisper_loss=0.09103, over 3852102.02 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:40:32,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4358990.0, ans=0.2 2024-08-19 07:40:33,810 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 07:40:42,738 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.184e+05 2024-08-19 07:40:51,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4359090.0, ans=0.125 2024-08-19 07:40:58,520 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 23 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 07:41:01,124 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 07:41:06,915 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.813e+01 2.217e+01 2.506e+01 2.797e+01 5.801e+01, threshold=5.013e+01, percent-clipped=1.0 2024-08-19 07:41:09,961 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2024-08-19 07:41:13,301 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 07:41:14,905 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:41:15,782 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 22 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 07:41:22,286 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 07:41:23,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4359390.0, ans=0.0 2024-08-19 07:41:30,472 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=5.71 vs. limit=12.0 2024-08-19 07:41:34,307 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5850, loss[loss=0.09292, beats_loss=0.01275, ecapa_loss=9.83e-05, whisper_loss=0.07919, over 15688.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01036, ecapa_loss=0.000144, whisper_loss=0.0908, over 3865293.73 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:41:35,842 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 25 from Vox, 42 fro AS 2024-08-19 07:41:37,079 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-19 07:41:54,048 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 07:41:58,090 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4359690.0, ans=0.125 2024-08-19 07:42:01,534 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 07:42:10,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4359790.0, ans=0.125 2024-08-19 07:42:15,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4359790.0, ans=0.2 2024-08-19 07:42:19,754 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2024-08-19 07:42:36,790 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5900, loss[loss=0.07872, beats_loss=0.01442, ecapa_loss=0.0001454, whisper_loss=0.06285, over 20799.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001436, whisper_loss=0.09029, over 3844501.05 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:42:41,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2024-08-19 07:42:46,895 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4359990.0, ans=0.125 2024-08-19 07:42:47,782 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 07:43:06,218 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 07:43:13,513 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.890e+01 2.336e+01 2.599e+01 2.904e+01 5.543e+01, threshold=5.198e+01, percent-clipped=1.0 2024-08-19 07:43:16,369 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 07:43:20,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4360290.0, ans=0.0 2024-08-19 07:43:21,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4360290.0, ans=0.125 2024-08-19 07:43:30,901 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 23 from Vox, 23 fro AS 2024-08-19 07:43:32,353 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 30 from Vox, 37 fro AS 2024-08-19 07:43:40,783 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 5950, loss[loss=0.1024, beats_loss=0.009991, ecapa_loss=0.0001573, whisper_loss=0.09084, over 20237.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01054, ecapa_loss=0.0001426, whisper_loss=0.08949, over 3861380.34 frames. ], batch size: 80, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:43:49,152 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.75 vs. limit=12.0 2024-08-19 07:43:59,786 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 07:44:43,771 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6000, loss[loss=0.1031, beats_loss=0.01083, ecapa_loss=0.0001418, whisper_loss=0.09089, over 17006.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01064, ecapa_loss=0.0001421, whisper_loss=0.08912, over 3876322.71 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:44:43,771 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 07:45:18,123 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on ASR_libri: loss=0.2522, beats_loss=0, ecapa_loss=0.0005153, whisper_loss=0.2471, over 922467.00 frames. 2024-08-19 07:45:36,071 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on SV_voxceleb1: loss=0.004094, beats_loss=0, ecapa_loss=0.0004094, whisper_loss=0, over 939242.00 frames. 2024-08-19 07:47:13,287 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on AT_audioset: loss=0.02301, beats_loss=0.02301, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 07:47:13,290 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 07:47:22,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4360990.0, ans=0.125 2024-08-19 07:47:35,706 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 07:47:44,460 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 07:47:45,771 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 07:47:48,006 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.878e+01 2.272e+01 2.512e+01 2.834e+01 4.204e+02, threshold=5.024e+01, percent-clipped=2.0 2024-08-19 07:47:49,957 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4361290.0, ans=0.0 2024-08-19 07:47:53,242 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 07:48:05,682 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 24 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 07:48:15,555 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6050, loss[loss=0.09648, beats_loss=0.01048, ecapa_loss=0.000125, whisper_loss=0.08475, over 16283.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01065, ecapa_loss=0.0001414, whisper_loss=0.0889, over 3877637.75 frames. ], batch size: 62, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:48:20,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4361490.0, ans=0.2 2024-08-19 07:48:28,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4361590.0, ans=0.0 2024-08-19 07:48:33,787 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=12.0 2024-08-19 07:48:42,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4361690.0, ans=0.1 2024-08-19 07:48:45,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4361690.0, ans=0.125 2024-08-19 07:48:50,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4361690.0, ans=0.125 2024-08-19 07:49:01,231 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 07:49:09,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4361890.0, ans=0.125 2024-08-19 07:49:17,004 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6100, loss[loss=0.1092, beats_loss=0.007094, ecapa_loss=0.0001472, whisper_loss=0.1006, over 21651.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001415, whisper_loss=0.08934, over 3886404.26 frames. ], batch size: 83, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:49:36,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4362090.0, ans=0.05 2024-08-19 07:49:36,912 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.54 vs. limit=10.0 2024-08-19 07:49:46,267 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 25 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 07:49:51,996 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.346e+01 2.570e+01 2.899e+01 1.665e+02, threshold=5.140e+01, percent-clipped=1.0 2024-08-19 07:50:10,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4362390.0, ans=0.125 2024-08-19 07:50:14,538 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 07:50:19,220 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6150, loss[loss=0.07386, beats_loss=0.01209, ecapa_loss=0.0001024, whisper_loss=0.06074, over 17014.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01058, ecapa_loss=0.0001419, whisper_loss=0.08955, over 3873656.87 frames. ], batch size: 66, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:50:28,059 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 07:50:34,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2024-08-19 07:50:40,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4362590.0, ans=0.0 2024-08-19 07:50:42,366 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2024-08-19 07:50:57,017 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 19 from Vox, 37 fro AS 2024-08-19 07:51:10,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4362890.0, ans=0.0 2024-08-19 07:51:17,014 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 07:51:21,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6200, loss[loss=0.1018, beats_loss=0.01219, ecapa_loss=0.0001332, whisper_loss=0.08832, over 21263.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01064, ecapa_loss=0.0001413, whisper_loss=0.08941, over 3921638.85 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:51:28,804 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 07:51:32,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4362990.0, ans=0.0 2024-08-19 07:51:44,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4363090.0, ans=0.1 2024-08-19 07:51:57,099 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.005e+01 2.318e+01 2.578e+01 2.832e+01 1.795e+02, threshold=5.155e+01, percent-clipped=1.0 2024-08-19 07:52:00,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4363290.0, ans=0.125 2024-08-19 07:52:01,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4363290.0, ans=0.0 2024-08-19 07:52:01,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4363290.0, ans=0.125 2024-08-19 07:52:10,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4363290.0, ans=10.0 2024-08-19 07:52:24,454 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6250, loss[loss=0.1084, beats_loss=0.008109, ecapa_loss=0.0001234, whisper_loss=0.09908, over 16270.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01064, ecapa_loss=0.0001423, whisper_loss=0.08966, over 3907174.30 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:52:36,391 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4363590.0, ans=0.0 2024-08-19 07:52:36,743 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.69 vs. limit=22.5 2024-08-19 07:52:53,699 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 29 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 07:52:53,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4363690.0, ans=0.125 2024-08-19 07:52:56,027 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 24 from Vox, 43 fro AS 2024-08-19 07:52:57,467 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4363690.0, ans=0.125 2024-08-19 07:52:58,619 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4363690.0, ans=0.125 2024-08-19 07:53:12,294 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 28 from Vox, 37 fro AS 2024-08-19 07:53:13,607 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 07:53:26,962 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6300, loss[loss=0.1144, beats_loss=0.009681, ecapa_loss=0.0001823, whisper_loss=0.1028, over 14888.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01065, ecapa_loss=0.0001416, whisper_loss=0.08957, over 3905083.70 frames. ], batch size: 60, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:53:27,630 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.125e-02 2024-08-19 07:53:35,033 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 20 from Vox, 46 fro AS 2024-08-19 07:53:38,624 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 07:53:51,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4364190.0, ans=0.1 2024-08-19 07:53:58,582 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 18 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 07:54:02,110 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.975e+01 2.421e+01 2.740e+01 3.001e+01 4.558e+01, threshold=5.480e+01, percent-clipped=0.0 2024-08-19 07:54:04,231 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.89 vs. limit=15.0 2024-08-19 07:54:13,639 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 07:54:17,648 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.24 vs. limit=10.0 2024-08-19 07:54:17,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.69 vs. limit=15.0 2024-08-19 07:54:29,817 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6350, loss[loss=0.124, beats_loss=0.006981, ecapa_loss=0.0001206, whisper_loss=0.1158, over 14980.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01055, ecapa_loss=0.0001423, whisper_loss=0.09019, over 3892748.49 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:54:33,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4364490.0, ans=0.1 2024-08-19 07:54:39,116 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2024-08-19 07:54:46,301 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4364590.0, ans=0.0 2024-08-19 07:54:46,408 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=12.0 2024-08-19 07:54:55,858 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 19 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 07:55:04,388 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 07:55:09,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4364790.0, ans=0.0 2024-08-19 07:55:12,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4364790.0, ans=0.125 2024-08-19 07:55:22,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.96 vs. limit=15.0 2024-08-19 07:55:32,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.28 vs. limit=10.0 2024-08-19 07:55:33,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6400, loss[loss=0.1059, beats_loss=0.00973, ecapa_loss=0.0001603, whisper_loss=0.09452, over 22733.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01056, ecapa_loss=0.0001417, whisper_loss=0.09027, over 3925891.38 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 07:55:51,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4365090.0, ans=0.125 2024-08-19 07:55:55,622 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 16 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 07:56:02,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4365190.0, ans=0.125 2024-08-19 07:56:15,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4365190.0, ans=0.2 2024-08-19 07:56:16,379 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.326e+01 2.535e+01 2.915e+01 6.831e+01, threshold=5.071e+01, percent-clipped=1.0 2024-08-19 07:56:17,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4365190.0, ans=0.125 2024-08-19 07:56:56,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6450, loss[loss=0.08895, beats_loss=0.0115, ecapa_loss=0.0001432, whisper_loss=0.07601, over 16790.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01059, ecapa_loss=0.0001411, whisper_loss=0.09012, over 3935645.68 frames. ], batch size: 71, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:57:18,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4365590.0, ans=0.125 2024-08-19 07:57:45,839 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 07:57:55,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4365790.0, ans=0.125 2024-08-19 07:58:23,581 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.25 vs. limit=22.5 2024-08-19 07:58:26,222 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6500, loss[loss=0.09939, beats_loss=0.01211, ecapa_loss=0.0001487, whisper_loss=0.0858, over 21071.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.09158, over 3945312.75 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 07:58:41,052 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4365990.0, ans=0.0 2024-08-19 07:59:29,242 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.973e+01 2.467e+01 2.714e+01 3.135e+01 4.455e+01, threshold=5.427e+01, percent-clipped=0.0 2024-08-19 07:59:34,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.19 vs. limit=22.5 2024-08-19 07:59:37,398 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 07:59:49,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4366290.0, ans=0.0 2024-08-19 08:00:05,465 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 08:00:15,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6550, loss[loss=0.1081, beats_loss=0.01091, ecapa_loss=0.0001262, whisper_loss=0.09593, over 15827.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01049, ecapa_loss=0.0001413, whisper_loss=0.09088, over 3918592.41 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 08:00:28,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4366490.0, ans=0.125 2024-08-19 08:00:31,952 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 08:00:35,378 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4366490.0, ans=0.125 2024-08-19 08:00:38,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4366590.0, ans=0.125 2024-08-19 08:01:09,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.33 vs. limit=15.0 2024-08-19 08:01:14,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4366690.0, ans=0.125 2024-08-19 08:01:23,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4366790.0, ans=0.0 2024-08-19 08:01:55,155 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 23 from LS+wenet, 19 from Vox, 51 fro AS 2024-08-19 08:01:58,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=4366890.0, ans=15.0 2024-08-19 08:02:07,318 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6600, loss[loss=0.102, beats_loss=0.01108, ecapa_loss=0.0001382, whisper_loss=0.08949, over 20321.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01044, ecapa_loss=0.0001422, whisper_loss=0.0909, over 3930326.99 frames. ], batch size: 82, lr: 2.04e-03, grad_scale: 1.152921504606847e+18 2024-08-19 08:02:13,960 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:02:35,842 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4367090.0, ans=0.125 2024-08-19 08:03:07,764 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4367190.0, ans=0.125 2024-08-19 08:03:08,905 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4367190.0, ans=0.0 2024-08-19 08:03:13,615 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.306e+01 2.532e+01 2.841e+01 4.066e+01, threshold=5.063e+01, percent-clipped=0.0 2024-08-19 08:03:21,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4367290.0, ans=0.0 2024-08-19 08:03:30,439 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 26 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 08:03:43,479 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6650, loss[loss=0.1191, beats_loss=0.009052, ecapa_loss=0.0001361, whisper_loss=0.1087, over 23108.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01047, ecapa_loss=0.0001427, whisper_loss=0.09009, over 3920730.48 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:03:44,737 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 08:03:46,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4367490.0, ans=0.2 2024-08-19 08:03:51,502 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=15.0 2024-08-19 08:03:53,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4367490.0, ans=0.125 2024-08-19 08:03:59,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4367590.0, ans=0.125 2024-08-19 08:04:01,273 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4367590.0, ans=0.05 2024-08-19 08:04:02,185 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 08:04:03,541 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 14 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 08:04:11,621 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.61 vs. limit=15.0 2024-08-19 08:04:12,455 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4367690.0, ans=0.0 2024-08-19 08:04:18,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4367690.0, ans=0.2 2024-08-19 08:04:19,453 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 21 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 08:04:29,688 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4367790.0, ans=0.0 2024-08-19 08:04:29,713 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4367790.0, ans=0.125 2024-08-19 08:04:36,446 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 08:04:39,437 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 08:04:43,398 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.645e+05 2024-08-19 08:04:52,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4367890.0, ans=0.0 2024-08-19 08:04:57,273 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6700, loss[loss=0.09229, beats_loss=0.01236, ecapa_loss=0.0001126, whisper_loss=0.07881, over 18998.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.09024, over 3909834.49 frames. ], batch size: 75, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:05:01,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4367990.0, ans=0.0 2024-08-19 08:05:14,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4368090.0, ans=0.0 2024-08-19 08:05:15,091 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 08:05:21,154 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 19 from Vox, 36 fro AS 2024-08-19 08:05:24,355 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 08:05:39,666 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 12 from Vox, 31 fro AS 2024-08-19 08:05:41,377 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4368190.0, ans=0.125 2024-08-19 08:05:42,029 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.605e+01 2.280e+01 2.491e+01 2.766e+01 3.799e+01, threshold=4.981e+01, percent-clipped=0.0 2024-08-19 08:05:46,217 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4368290.0, ans=0.05 2024-08-19 08:06:07,134 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 08:06:13,701 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6750, loss[loss=0.0825, beats_loss=0.01043, ecapa_loss=0.000188, whisper_loss=0.07019, over 20560.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01046, ecapa_loss=0.0001423, whisper_loss=0.09033, over 3893283.79 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:06:25,641 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:06:29,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4368590.0, ans=0.125 2024-08-19 08:07:01,966 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 11 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 08:07:03,213 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 08:07:04,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4368790.0, ans=0.0 2024-08-19 08:07:09,561 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 08:07:09,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4368790.0, ans=0.125 2024-08-19 08:07:19,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4368890.0, ans=0.125 2024-08-19 08:07:26,784 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.14 vs. limit=22.5 2024-08-19 08:07:28,586 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6800, loss[loss=0.1032, beats_loss=0.01039, ecapa_loss=0.0001365, whisper_loss=0.09148, over 18825.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001431, whisper_loss=0.08948, over 3872308.36 frames. ], batch size: 74, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:07:45,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4369090.0, ans=0.1 2024-08-19 08:07:47,957 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 08:07:53,324 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-08-19 08:07:54,849 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4369090.0, ans=0.125 2024-08-19 08:07:54,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4369090.0, ans=0.0 2024-08-19 08:08:11,803 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.854e+01 2.446e+01 2.587e+01 2.881e+01 4.116e+01, threshold=5.174e+01, percent-clipped=0.0 2024-08-19 08:08:16,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4369290.0, ans=0.0 2024-08-19 08:08:16,933 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.82 vs. limit=22.5 2024-08-19 08:08:27,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=4369390.0, ans=0.025 2024-08-19 08:08:27,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=4369390.0, ans=0.09899494936611666 2024-08-19 08:08:28,772 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2024-08-19 08:08:42,303 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6850, loss[loss=0.09729, beats_loss=0.009692, ecapa_loss=0.0001616, whisper_loss=0.08598, over 16879.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001428, whisper_loss=0.08953, over 3877339.15 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:08:53,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4369490.0, ans=0.1 2024-08-19 08:09:03,644 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 08:09:11,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4369690.0, ans=0.125 2024-08-19 08:09:38,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4369790.0, ans=0.0 2024-08-19 08:09:38,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4369790.0, ans=0.0 2024-08-19 08:09:43,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4369890.0, ans=0.2 2024-08-19 08:09:51,106 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:09:54,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4369890.0, ans=10.0 2024-08-19 08:09:57,524 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6900, loss[loss=0.05567, beats_loss=0.01363, ecapa_loss=0.0001489, whisper_loss=0.04055, over 21182.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01052, ecapa_loss=0.0001433, whisper_loss=0.08946, over 3856522.96 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:10:01,516 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.89 vs. limit=15.0 2024-08-19 08:10:08,641 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 08:10:11,350 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 33 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 08:10:19,776 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-08-19 08:10:40,726 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.856e+01 2.283e+01 2.474e+01 2.721e+01 3.268e+01, threshold=4.948e+01, percent-clipped=0.0 2024-08-19 08:10:42,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=4370290.0, ans=10.0 2024-08-19 08:10:46,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.75 vs. limit=15.0 2024-08-19 08:10:49,616 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:10:53,096 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 24 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 08:10:55,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4370290.0, ans=0.125 2024-08-19 08:11:10,048 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4370390.0, ans=0.05 2024-08-19 08:11:12,280 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 6950, loss[loss=0.1158, beats_loss=0.007941, ecapa_loss=0.0001733, whisper_loss=0.1061, over 14444.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001437, whisper_loss=0.08943, over 3849635.92 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:11:30,987 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 08:11:35,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4370590.0, ans=0.125 2024-08-19 08:11:35,908 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 27 from Vox, 39 fro AS 2024-08-19 08:12:13,812 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 14 from Vox, 42 fro AS 2024-08-19 08:12:15,221 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 27 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 08:12:17,192 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4370890.0, ans=0.1 2024-08-19 08:12:25,243 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.68 vs. limit=10.0 2024-08-19 08:12:28,950 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7000, loss[loss=0.06846, beats_loss=0.01338, ecapa_loss=0.0001658, whisper_loss=0.05342, over 18403.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001432, whisper_loss=0.0897, over 3828157.45 frames. ], batch size: 81, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:12:45,696 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4371090.0, ans=0.0 2024-08-19 08:13:02,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4371190.0, ans=0.125 2024-08-19 08:13:04,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4371190.0, ans=0.125 2024-08-19 08:13:11,937 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.284e+01 2.585e+01 2.897e+01 5.224e+01, threshold=5.169e+01, percent-clipped=1.0 2024-08-19 08:13:12,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2024-08-19 08:13:15,578 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 08:13:23,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4371290.0, ans=0.0 2024-08-19 08:13:25,471 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 08:13:26,619 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-19 08:13:38,706 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 08:13:42,970 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7050, loss[loss=0.1005, beats_loss=0.009254, ecapa_loss=0.0002025, whisper_loss=0.08919, over 19763.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001435, whisper_loss=0.08995, over 3863290.78 frames. ], batch size: 85, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:13:49,578 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 08:14:00,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=4371590.0, ans=0.02 2024-08-19 08:14:36,093 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 08:14:37,557 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 16 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 08:14:39,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.82 vs. limit=10.0 2024-08-19 08:15:02,238 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7100, loss[loss=0.08628, beats_loss=0.01282, ecapa_loss=0.0001209, whisper_loss=0.07225, over 17770.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.0001414, whisper_loss=0.08979, over 3894320.30 frames. ], batch size: 70, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:15:34,600 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 08:15:34,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=4372190.0, ans=0.05 2024-08-19 08:15:36,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.67 vs. limit=22.5 2024-08-19 08:15:37,888 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 08:15:46,038 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 32 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 08:15:47,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4372190.0, ans=0.0 2024-08-19 08:15:48,446 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.334e+01 2.574e+01 2.776e+01 4.254e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 08:15:53,300 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 35 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 08:16:03,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4372390.0, ans=0.0 2024-08-19 08:16:18,490 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7150, loss[loss=0.09717, beats_loss=0.01073, ecapa_loss=0.0001124, whisper_loss=0.08532, over 15713.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01052, ecapa_loss=0.0001416, whisper_loss=0.08957, over 3886449.48 frames. ], batch size: 59, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:16:34,097 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 08:16:37,497 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=22.5 2024-08-19 08:16:47,181 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 08:17:13,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4372790.0, ans=0.125 2024-08-19 08:17:16,728 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 17 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 08:17:20,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4372890.0, ans=0.125 2024-08-19 08:17:20,346 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4372890.0, ans=0.125 2024-08-19 08:17:30,804 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 08:17:36,043 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 08:17:37,070 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7200, loss[loss=0.106, beats_loss=0.01075, ecapa_loss=0.0001329, whisper_loss=0.09396, over 21875.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01054, ecapa_loss=0.0001422, whisper_loss=0.08988, over 3921299.81 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:17:57,634 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 20 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 08:18:07,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=4373090.0, ans=15.0 2024-08-19 08:18:09,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=8.0 2024-08-19 08:18:23,254 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.885e+01 2.339e+01 2.591e+01 2.933e+01 7.006e+01, threshold=5.182e+01, percent-clipped=1.0 2024-08-19 08:18:35,850 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 08:18:38,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=4.01 vs. limit=15.0 2024-08-19 08:18:54,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4373390.0, ans=0.125 2024-08-19 08:18:56,172 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7250, loss[loss=0.1125, beats_loss=0.01035, ecapa_loss=0.0001362, whisper_loss=0.1008, over 23414.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09087, over 3946238.46 frames. ], batch size: 94, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:19:06,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4373490.0, ans=0.0 2024-08-19 08:19:43,297 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.40 vs. limit=22.5 2024-08-19 08:19:55,018 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 08:20:15,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7300, loss[loss=0.1103, beats_loss=0.008956, ecapa_loss=0.0001483, whisper_loss=0.09988, over 22945.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01041, ecapa_loss=0.0001415, whisper_loss=0.09148, over 3949938.66 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:20:39,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4374090.0, ans=0.1 2024-08-19 08:20:52,711 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 08:20:52,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4374190.0, ans=0.125 2024-08-19 08:20:58,470 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 08:21:04,225 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-19 08:21:04,708 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.917e+01 2.340e+01 2.529e+01 2.737e+01 3.250e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 08:21:08,863 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 08:21:27,932 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 13 from LS+wenet, 11 from Vox, 35 fro AS 2024-08-19 08:21:38,091 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7350, loss[loss=0.09508, beats_loss=0.01235, ecapa_loss=0.0001316, whisper_loss=0.08142, over 23043.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01039, ecapa_loss=0.0001421, whisper_loss=0.09131, over 3956987.10 frames. ], batch size: 93, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:21:40,796 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.51 vs. limit=15.0 2024-08-19 08:21:49,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4374490.0, ans=0.125 2024-08-19 08:21:55,111 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4374590.0, ans=0.125 2024-08-19 08:22:07,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4374690.0, ans=0.125 2024-08-19 08:22:16,129 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 08:22:40,478 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 08:22:42,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4374890.0, ans=0.125 2024-08-19 08:22:50,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4374890.0, ans=0.1 2024-08-19 08:22:54,675 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7400, loss[loss=0.07961, beats_loss=0.01321, ecapa_loss=0.0001095, whisper_loss=0.06531, over 19117.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01036, ecapa_loss=0.0001415, whisper_loss=0.0912, over 3938000.26 frames. ], batch size: 75, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:23:06,406 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 08:23:06,874 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4374990.0, ans=0.125 2024-08-19 08:23:10,725 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2024-08-19 08:23:21,493 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4375090.0, ans=0.2 2024-08-19 08:23:33,897 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4375190.0, ans=0.1 2024-08-19 08:23:43,113 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.350e+01 2.542e+01 2.861e+01 4.984e+02, threshold=5.085e+01, percent-clipped=1.0 2024-08-19 08:24:00,185 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-19 08:24:15,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7450, loss[loss=0.1127, beats_loss=0.008342, ecapa_loss=0.0001688, whisper_loss=0.1027, over 21387.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01031, ecapa_loss=0.0001426, whisper_loss=0.09151, over 3918994.98 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:24:17,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4375490.0, ans=0.125 2024-08-19 08:24:21,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4375490.0, ans=0.0 2024-08-19 08:24:31,821 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4375590.0, ans=0.1 2024-08-19 08:24:33,186 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 22 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 08:24:40,112 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 08:24:52,055 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 08:25:16,034 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 08:25:30,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7500, loss[loss=0.1088, beats_loss=0.01084, ecapa_loss=9.025e-05, whisper_loss=0.09709, over 20795.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01026, ecapa_loss=0.000143, whisper_loss=0.09154, over 3889715.95 frames. ], batch size: 75, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:25:30,146 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 27 from Vox, 40 fro AS 2024-08-19 08:25:36,773 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4375990.0, ans=0.125 2024-08-19 08:25:42,329 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 08:25:47,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-08-19 08:25:49,381 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4376090.0, ans=0.1 2024-08-19 08:25:52,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4376090.0, ans=0.125 2024-08-19 08:25:57,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4376090.0, ans=0.5 2024-08-19 08:26:00,267 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4376190.0, ans=0.125 2024-08-19 08:26:12,077 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.516e+01 2.304e+01 2.520e+01 2.744e+01 4.658e+01, threshold=5.040e+01, percent-clipped=0.0 2024-08-19 08:26:12,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4376290.0, ans=0.125 2024-08-19 08:26:18,595 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-08-19 08:26:26,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4376290.0, ans=0.0 2024-08-19 08:26:26,471 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4376290.0, ans=0.125 2024-08-19 08:26:27,305 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 08:26:28,063 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2024-08-19 08:26:30,387 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 19 from LS+wenet, 23 from Vox, 54 fro AS 2024-08-19 08:26:37,284 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:26:41,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4376390.0, ans=0.125 2024-08-19 08:26:44,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7550, loss[loss=0.1109, beats_loss=0.01163, ecapa_loss=0.0001487, whisper_loss=0.09774, over 19819.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01035, ecapa_loss=0.0001425, whisper_loss=0.09135, over 3876900.77 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:26:49,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4376490.0, ans=0.125 2024-08-19 08:26:55,752 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 08:26:59,220 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 08:27:10,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4376590.0, ans=0.07 2024-08-19 08:27:19,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4376690.0, ans=0.125 2024-08-19 08:27:43,458 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 08:27:52,887 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 08:28:01,321 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7600, loss[loss=0.1161, beats_loss=0.009475, ecapa_loss=0.0001632, whisper_loss=0.105, over 17237.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01042, ecapa_loss=0.0001416, whisper_loss=0.09055, over 3861104.89 frames. ], batch size: 68, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:28:02,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4376990.0, ans=0.0 2024-08-19 08:28:11,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4376990.0, ans=0.125 2024-08-19 08:28:12,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4376990.0, ans=0.125 2024-08-19 08:28:17,987 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 22 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 08:28:42,622 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 08:28:45,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.278e+01 2.433e+01 2.694e+01 5.084e+01, threshold=4.867e+01, percent-clipped=1.0 2024-08-19 08:28:45,139 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 08:28:49,143 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 19 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 08:28:52,002 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 18 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 08:29:11,501 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4377390.0, ans=0.125 2024-08-19 08:29:14,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4377490.0, ans=0.125 2024-08-19 08:29:15,029 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7650, loss[loss=0.1082, beats_loss=0.00977, ecapa_loss=0.0001528, whisper_loss=0.09691, over 16216.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01036, ecapa_loss=0.0001414, whisper_loss=0.09114, over 3872198.46 frames. ], batch size: 64, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:29:20,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4377490.0, ans=0.05 2024-08-19 08:29:20,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4377490.0, ans=0.1 2024-08-19 08:29:56,794 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 08:30:22,391 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.51 vs. limit=15.0 2024-08-19 08:30:24,334 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7700, loss[loss=0.1088, beats_loss=0.009425, ecapa_loss=0.0001441, whisper_loss=0.09798, over 22790.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01035, ecapa_loss=0.0001419, whisper_loss=0.09052, over 3882954.72 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:30:39,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4378090.0, ans=0.125 2024-08-19 08:30:41,255 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-08-19 08:31:03,702 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 23 from Vox, 28 fro AS 2024-08-19 08:31:05,963 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.389e+01 2.553e+01 2.809e+01 4.632e+01, threshold=5.107e+01, percent-clipped=0.0 2024-08-19 08:31:06,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4378290.0, ans=0.0 2024-08-19 08:31:07,499 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 22 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 08:31:10,473 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 19 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 08:31:12,387 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.41 vs. limit=12.0 2024-08-19 08:31:16,750 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 21 from Vox, 18 fro AS 2024-08-19 08:31:34,547 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7750, loss[loss=0.09995, beats_loss=0.008308, ecapa_loss=0.0001319, whisper_loss=0.09033, over 23502.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01039, ecapa_loss=0.0001414, whisper_loss=0.08949, over 3850962.59 frames. ], batch size: 91, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:31:37,362 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 08:31:43,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4378490.0, ans=0.125 2024-08-19 08:31:48,237 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 15 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 08:31:50,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2024-08-19 08:32:11,695 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:32:16,706 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 08:32:23,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4378790.0, ans=0.0 2024-08-19 08:32:28,482 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 08:32:28,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4378890.0, ans=0.0 2024-08-19 08:32:31,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.51 vs. limit=15.0 2024-08-19 08:32:41,814 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7800, loss[loss=0.09326, beats_loss=0.01154, ecapa_loss=0.0001631, whisper_loss=0.08009, over 20550.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01042, ecapa_loss=0.0001417, whisper_loss=0.08963, over 3856262.59 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:33:16,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4379190.0, ans=0.125 2024-08-19 08:33:20,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4379190.0, ans=0.125 2024-08-19 08:33:20,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4379190.0, ans=0.0 2024-08-19 08:33:20,977 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.843e+01 2.315e+01 2.563e+01 2.893e+01 6.411e+01, threshold=5.126e+01, percent-clipped=2.0 2024-08-19 08:33:40,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=12.16 vs. limit=12.0 2024-08-19 08:33:41,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4379390.0, ans=0.125 2024-08-19 08:33:48,982 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7850, loss[loss=0.0996, beats_loss=0.01068, ecapa_loss=0.0001402, whisper_loss=0.08752, over 18193.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01049, ecapa_loss=0.0001423, whisper_loss=0.08974, over 3890442.96 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 5.764607523034235e+17 2024-08-19 08:33:58,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4379490.0, ans=0.0 2024-08-19 08:34:28,334 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 08:34:54,668 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7900, loss[loss=0.102, beats_loss=0.01113, ecapa_loss=0.000122, whisper_loss=0.08968, over 23640.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01044, ecapa_loss=0.0001422, whisper_loss=0.09045, over 3921730.92 frames. ], batch size: 92, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:35:11,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=5.53 vs. limit=12.0 2024-08-19 08:35:19,567 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 08:35:22,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4380190.0, ans=0.035 2024-08-19 08:35:22,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4380190.0, ans=0.125 2024-08-19 08:35:23,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4380190.0, ans=0.2 2024-08-19 08:35:34,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.772e+01 2.323e+01 2.601e+01 2.970e+01 4.832e+01, threshold=5.202e+01, percent-clipped=0.0 2024-08-19 08:35:35,496 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 22 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 08:36:01,048 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 7950, loss[loss=0.09727, beats_loss=0.009351, ecapa_loss=0.0001152, whisper_loss=0.08677, over 16154.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01049, ecapa_loss=0.0001416, whisper_loss=0.08997, over 3894631.65 frames. ], batch size: 62, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:36:15,058 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 33 from LS+wenet, 11 from Vox, 18 fro AS 2024-08-19 08:36:17,015 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:36:21,356 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.02 vs. limit=12.0 2024-08-19 08:36:38,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4380690.0, ans=0.125 2024-08-19 08:36:49,912 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 26 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 08:36:54,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2024-08-19 08:37:08,550 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8000, loss[loss=0.08328, beats_loss=0.01035, ecapa_loss=0.0001727, whisper_loss=0.07121, over 16045.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.0104, ecapa_loss=0.0001419, whisper_loss=0.09043, over 3876791.74 frames. ], batch size: 66, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:37:08,699 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 08:37:15,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2024-08-19 08:37:16,486 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 08:37:26,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4381090.0, ans=0.1 2024-08-19 08:37:28,601 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 08:37:31,012 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=22.5 2024-08-19 08:37:32,798 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 08:37:41,332 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4381190.0, ans=0.2 2024-08-19 08:37:46,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4381190.0, ans=0.2 2024-08-19 08:37:48,863 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.335e+01 2.541e+01 2.792e+01 1.974e+02, threshold=5.082e+01, percent-clipped=1.0 2024-08-19 08:38:00,998 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 08:38:02,681 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 21 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 08:38:07,171 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 08:38:08,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4381390.0, ans=0.1 2024-08-19 08:38:15,402 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 21 from LS+wenet, 11 from Vox, 22 fro AS 2024-08-19 08:38:16,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8050, loss[loss=0.1134, beats_loss=0.01001, ecapa_loss=0.000107, whisper_loss=0.1023, over 14577.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.09003, over 3896855.70 frames. ], batch size: 54, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:38:20,008 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=12.0 2024-08-19 08:38:29,593 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 26 from Vox, 30 fro AS 2024-08-19 08:39:11,783 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 08:39:14,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4381890.0, ans=0.125 2024-08-19 08:39:22,041 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 08:39:26,669 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 08:39:29,635 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8100, loss[loss=0.1105, beats_loss=0.008183, ecapa_loss=0.0001598, whisper_loss=0.1007, over 18509.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01043, ecapa_loss=0.0001407, whisper_loss=0.0904, over 3906746.83 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:39:44,462 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 08:39:51,019 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 08:40:06,141 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 20 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 08:40:08,106 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.51 vs. limit=15.0 2024-08-19 08:40:10,128 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 27 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 08:40:12,651 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.771e+01 2.219e+01 2.443e+01 2.808e+01 4.973e+01, threshold=4.885e+01, percent-clipped=0.0 2024-08-19 08:40:12,767 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 08:40:20,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4382290.0, ans=0.125 2024-08-19 08:40:20,985 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 08:40:41,174 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8150, loss[loss=0.1073, beats_loss=0.01075, ecapa_loss=0.0001284, whisper_loss=0.09527, over 21642.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09049, over 3913093.37 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:40:43,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=4.58 vs. limit=12.0 2024-08-19 08:40:53,876 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 08:40:55,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4382590.0, ans=0.125 2024-08-19 08:41:18,661 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 33 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 08:41:19,238 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4382690.0, ans=0.0 2024-08-19 08:41:26,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4382790.0, ans=0.2 2024-08-19 08:41:32,134 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 08:41:32,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4382790.0, ans=0.125 2024-08-19 08:41:33,086 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-08-19 08:41:38,964 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4382890.0, ans=0.1 2024-08-19 08:41:52,721 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8200, loss[loss=0.0909, beats_loss=0.01063, ecapa_loss=0.00013, whisper_loss=0.07896, over 19638.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001411, whisper_loss=0.09039, over 3919599.29 frames. ], batch size: 79, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:42:01,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4382990.0, ans=0.125 2024-08-19 08:42:25,016 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 08:42:35,677 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.301e+01 2.611e+01 2.872e+01 3.807e+01, threshold=5.223e+01, percent-clipped=0.0 2024-08-19 08:42:35,993 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 08:42:56,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4383390.0, ans=0.125 2024-08-19 08:42:57,918 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4383390.0, ans=0.1 2024-08-19 08:42:58,795 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 18 from Vox, 14 fro AS 2024-08-19 08:43:01,393 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 08:43:03,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8250, loss[loss=0.1092, beats_loss=0.01027, ecapa_loss=0.000107, whisper_loss=0.09783, over 15768.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0104, ecapa_loss=0.0001412, whisper_loss=0.08997, over 3924460.48 frames. ], batch size: 58, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:43:14,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=12.0 2024-08-19 08:43:28,661 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.23 vs. limit=22.5 2024-08-19 08:43:35,089 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2024-08-19 08:44:16,665 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 08:44:22,523 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8300, loss[loss=0.1135, beats_loss=0.01015, ecapa_loss=0.0001261, whisper_loss=0.1021, over 20649.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001401, whisper_loss=0.08995, over 3901101.29 frames. ], batch size: 79, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:44:33,752 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=15.0 2024-08-19 08:44:43,122 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4384090.0, ans=0.07 2024-08-19 08:44:52,851 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4384190.0, ans=0.1 2024-08-19 08:44:54,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4384190.0, ans=0.125 2024-08-19 08:45:00,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4384190.0, ans=0.125 2024-08-19 08:45:00,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.56 vs. limit=15.0 2024-08-19 08:45:01,196 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 08:45:06,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4384290.0, ans=0.07 2024-08-19 08:45:07,633 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.452e+01 2.719e+01 3.109e+01 1.763e+02, threshold=5.438e+01, percent-clipped=2.0 2024-08-19 08:45:20,229 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 37 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 08:45:22,571 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-08-19 08:45:24,049 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 08:45:30,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4384390.0, ans=0.125 2024-08-19 08:45:31,338 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 08:45:33,059 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4384390.0, ans=0.125 2024-08-19 08:45:36,742 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8350, loss[loss=0.0897, beats_loss=0.01043, ecapa_loss=0.0001402, whisper_loss=0.07787, over 15247.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001404, whisper_loss=0.09031, over 3889283.79 frames. ], batch size: 63, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:46:01,142 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 08:46:03,865 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-08-19 08:46:13,103 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 12 from Vox, 38 fro AS 2024-08-19 08:46:22,259 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4384790.0, ans=0.0 2024-08-19 08:46:31,450 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4384790.0, ans=0.125 2024-08-19 08:46:37,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4384890.0, ans=0.5 2024-08-19 08:46:40,018 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4384890.0, ans=0.95 2024-08-19 08:46:45,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4384890.0, ans=0.2 2024-08-19 08:46:47,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4384890.0, ans=0.125 2024-08-19 08:46:50,717 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8400, loss[loss=0.1181, beats_loss=0.008803, ecapa_loss=0.0001485, whisper_loss=0.1078, over 18861.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01029, ecapa_loss=0.0001427, whisper_loss=0.09138, over 3883792.24 frames. ], batch size: 72, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:47:00,555 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 08:47:04,046 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 24 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 08:47:32,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4385190.0, ans=0.125 2024-08-19 08:47:39,327 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.758e+01 2.283e+01 2.484e+01 2.806e+01 4.179e+01, threshold=4.968e+01, percent-clipped=0.0 2024-08-19 08:47:48,667 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 08:47:51,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4385290.0, ans=0.125 2024-08-19 08:47:53,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4385290.0, ans=0.2 2024-08-19 08:48:02,691 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 08:48:04,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4385390.0, ans=0.125 2024-08-19 08:48:06,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4385390.0, ans=0.125 2024-08-19 08:48:10,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8450, loss[loss=0.1161, beats_loss=0.01062, ecapa_loss=0.0001461, whisper_loss=0.104, over 22427.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01033, ecapa_loss=0.0001422, whisper_loss=0.09079, over 3873768.62 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:48:36,094 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2024-08-19 08:48:41,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4385690.0, ans=0.035 2024-08-19 08:48:41,179 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.553e+01 2024-08-19 08:48:56,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.98 vs. limit=15.0 2024-08-19 08:49:22,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8500, loss[loss=0.111, beats_loss=0.01049, ecapa_loss=0.0001489, whisper_loss=0.09899, over 19900.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.000143, whisper_loss=0.08991, over 3870946.89 frames. ], batch size: 82, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:49:24,932 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.03 vs. limit=22.5 2024-08-19 08:49:41,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4386090.0, ans=0.1 2024-08-19 08:49:44,929 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 19 from Vox, 43 fro AS 2024-08-19 08:49:45,195 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4386090.0, ans=0.125 2024-08-19 08:50:06,340 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.801e+01 2.300e+01 2.586e+01 2.886e+01 4.322e+01, threshold=5.173e+01, percent-clipped=0.0 2024-08-19 08:50:06,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=4386290.0, ans=0.125 2024-08-19 08:50:31,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4386390.0, ans=0.125 2024-08-19 08:50:36,433 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8550, loss[loss=0.1034, beats_loss=0.01126, ecapa_loss=0.0001456, whisper_loss=0.09067, over 21126.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001423, whisper_loss=0.09033, over 3893523.32 frames. ], batch size: 90, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:50:50,140 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4386490.0, ans=0.0 2024-08-19 08:50:59,856 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 25 from Vox, 19 fro AS 2024-08-19 08:51:03,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4386590.0, ans=0.2 2024-08-19 08:51:04,879 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 12 from LS+wenet, 28 from Vox, 19 fro AS 2024-08-19 08:51:05,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4386590.0, ans=0.125 2024-08-19 08:51:08,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4386690.0, ans=0.0 2024-08-19 08:51:16,624 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 13 from Vox, 44 fro AS 2024-08-19 08:51:33,035 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:51:35,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4386790.0, ans=0.125 2024-08-19 08:51:36,252 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 08:51:38,285 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.61 vs. limit=10.0 2024-08-19 08:51:39,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4386890.0, ans=0.1 2024-08-19 08:51:41,006 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4386890.0, ans=0.1 2024-08-19 08:51:41,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4386890.0, ans=0.0 2024-08-19 08:51:41,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.16 vs. limit=12.0 2024-08-19 08:51:45,033 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-08-19 08:51:45,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4386890.0, ans=0.1 2024-08-19 08:51:48,758 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 08:51:53,652 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8600, loss[loss=0.1136, beats_loss=0.01021, ecapa_loss=0.000111, whisper_loss=0.1023, over 22799.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01035, ecapa_loss=0.0001434, whisper_loss=0.09027, over 3884733.63 frames. ], batch size: 84, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:52:06,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2024-08-19 08:52:28,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4387190.0, ans=0.1 2024-08-19 08:52:35,556 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4387190.0, ans=0.1 2024-08-19 08:52:38,999 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.282e+01 2.546e+01 2.881e+01 4.529e+01, threshold=5.091e+01, percent-clipped=0.0 2024-08-19 08:52:48,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4387290.0, ans=0.125 2024-08-19 08:53:06,413 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8650, loss[loss=0.09487, beats_loss=0.01031, ecapa_loss=0.0001598, whisper_loss=0.08296, over 20767.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01036, ecapa_loss=0.000145, whisper_loss=0.08983, over 3863054.73 frames. ], batch size: 88, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:53:10,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4387490.0, ans=0.0 2024-08-19 08:53:31,847 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 08:53:32,069 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 08:53:40,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4387690.0, ans=15.0 2024-08-19 08:53:49,092 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 22 from Vox, 39 fro AS 2024-08-19 08:53:53,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=4387790.0, ans=0.025 2024-08-19 08:53:55,287 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 20 from LS+wenet, 15 from Vox, 19 fro AS 2024-08-19 08:53:59,501 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 08:54:06,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4387890.0, ans=0.125 2024-08-19 08:54:12,823 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4387890.0, ans=0.0 2024-08-19 08:54:17,932 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8700, loss[loss=0.1193, beats_loss=0.009142, ecapa_loss=0.0001376, whisper_loss=0.1088, over 23295.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01038, ecapa_loss=0.0001446, whisper_loss=0.09025, over 3871727.09 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:54:26,129 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 8 from Vox, 34 fro AS 2024-08-19 08:54:37,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4388090.0, ans=0.125 2024-08-19 08:54:40,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4388090.0, ans=0.1 2024-08-19 08:54:46,146 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4388190.0, ans=0.0 2024-08-19 08:54:47,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4388190.0, ans=0.0 2024-08-19 08:54:47,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4388190.0, ans=0.2 2024-08-19 08:54:50,130 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4388190.0, ans=0.1 2024-08-19 08:54:50,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4388190.0, ans=0.0 2024-08-19 08:54:57,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.36 vs. limit=10.0 2024-08-19 08:54:59,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.796e+01 2.272e+01 2.455e+01 2.713e+01 3.409e+01, threshold=4.910e+01, percent-clipped=0.0 2024-08-19 08:54:59,619 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 08:55:02,817 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 21 from Vox, 47 fro AS 2024-08-19 08:55:07,084 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2024-08-19 08:55:07,951 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 08:55:11,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=12.0 2024-08-19 08:55:12,532 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 9 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 08:55:19,896 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.64 vs. limit=22.5 2024-08-19 08:55:29,304 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8750, loss[loss=0.09839, beats_loss=0.01055, ecapa_loss=0.000139, whisper_loss=0.08645, over 15454.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01038, ecapa_loss=0.0001442, whisper_loss=0.08979, over 3863284.37 frames. ], batch size: 61, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:55:29,470 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 34 from Vox, 34 fro AS 2024-08-19 08:55:30,232 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4388490.0, ans=0.2 2024-08-19 08:55:31,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4388490.0, ans=0.125 2024-08-19 08:55:41,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4388490.0, ans=0.0 2024-08-19 08:55:44,390 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 27 from Vox, 26 fro AS 2024-08-19 08:55:56,377 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 24 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 08:56:01,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4388690.0, ans=0.2 2024-08-19 08:56:05,859 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 08:56:10,198 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 33 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 08:56:26,651 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 08:56:35,084 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4388890.0, ans=0.1 2024-08-19 08:56:38,516 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 13 from Vox, 39 fro AS 2024-08-19 08:56:44,536 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8800, loss[loss=0.1044, beats_loss=0.01021, ecapa_loss=0.0001272, whisper_loss=0.09291, over 23086.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.000144, whisper_loss=0.08984, over 3860029.96 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:56:46,241 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4388990.0, ans=0.0 2024-08-19 08:57:00,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4389090.0, ans=0.125 2024-08-19 08:57:05,508 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.99 vs. limit=12.0 2024-08-19 08:57:07,391 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08632224053144455, model_norm_threshold=49.099090576171875 2024-08-19 08:57:07,554 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.22, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.037e+04, grad_sumsq=6.733e+06, orig_rms_sq=1.045e-02 2024-08-19 08:57:12,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4389190.0, ans=0.125 2024-08-19 08:57:18,622 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 08:57:27,447 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.642e+01 2.295e+01 2.616e+01 2.854e+01 5.688e+02, threshold=5.231e+01, percent-clipped=2.0 2024-08-19 08:57:30,992 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 08:57:39,505 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 08:57:50,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4389390.0, ans=0.95 2024-08-19 08:57:53,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4389390.0, ans=0.0 2024-08-19 08:57:57,252 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8850, loss[loss=0.1052, beats_loss=0.00973, ecapa_loss=0.0001339, whisper_loss=0.09415, over 23190.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001421, whisper_loss=0.08977, over 3886500.73 frames. ], batch size: 86, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:57:58,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4389490.0, ans=0.125 2024-08-19 08:58:03,792 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4389490.0, ans=0.0 2024-08-19 08:58:07,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4389490.0, ans=0.125 2024-08-19 08:58:23,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4389590.0, ans=0.2 2024-08-19 08:58:25,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4389590.0, ans=0.0 2024-08-19 08:58:46,021 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 27 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 08:58:47,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=4389790.0, ans=12.0 2024-08-19 08:58:48,360 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 21 from LS+wenet, 15 from Vox, 20 fro AS 2024-08-19 08:58:58,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4389890.0, ans=0.125 2024-08-19 08:59:01,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4389890.0, ans=0.0 2024-08-19 08:59:12,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8900, loss[loss=0.1045, beats_loss=0.01028, ecapa_loss=0.0001143, whisper_loss=0.09305, over 14784.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.0901, over 3847645.69 frames. ], batch size: 55, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 08:59:12,867 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 29 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 08:59:17,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4389990.0, ans=0.025 2024-08-19 08:59:27,317 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 08:59:30,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4390090.0, ans=0.0 2024-08-19 08:59:56,106 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.858e+01 2.234e+01 2.571e+01 2.897e+01 3.544e+02, threshold=5.141e+01, percent-clipped=1.0 2024-08-19 09:00:20,147 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 09:00:25,795 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 8950, loss[loss=0.09732, beats_loss=0.009093, ecapa_loss=0.000161, whisper_loss=0.08661, over 17113.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01047, ecapa_loss=0.000142, whisper_loss=0.09006, over 3846185.55 frames. ], batch size: 72, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:00:47,460 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 09:00:52,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4390590.0, ans=0.125 2024-08-19 09:00:56,168 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 09:00:58,764 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 09:00:59,680 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-08-19 09:01:07,149 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 31 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 09:01:19,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4390790.0, ans=0.125 2024-08-19 09:01:36,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9000, loss[loss=0.09029, beats_loss=0.0117, ecapa_loss=0.0001646, whisper_loss=0.07694, over 20234.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01052, ecapa_loss=0.000141, whisper_loss=0.09023, over 3846884.99 frames. ], batch size: 86, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:01:36,617 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 09:02:14,378 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on ASR_libri: loss=0.2532, beats_loss=0, ecapa_loss=0.0005125, whisper_loss=0.2481, over 922467.00 frames. 2024-08-19 09:02:33,134 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on SV_voxceleb1: loss=0.003997, beats_loss=0, ecapa_loss=0.0003997, whisper_loss=0, over 939242.00 frames. 2024-08-19 09:04:17,192 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on AT_audioset: loss=0.02307, beats_loss=0.02307, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 09:04:17,195 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 09:04:18,585 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 20 from LS+wenet, 32 from Vox, 39 fro AS 2024-08-19 09:04:20,326 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-19 09:04:21,782 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4390990.0, ans=0.125 2024-08-19 09:04:28,669 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 30 from Vox, 28 fro AS 2024-08-19 09:04:41,538 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 09:04:49,425 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2024-08-19 09:04:55,051 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-08-19 09:05:01,276 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.794e+01 2.350e+01 2.627e+01 2.965e+01 6.113e+01, threshold=5.254e+01, percent-clipped=2.0 2024-08-19 09:05:08,863 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4391290.0, ans=0.1 2024-08-19 09:05:10,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2024-08-19 09:05:19,880 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=4391390.0, ans=0.1 2024-08-19 09:05:26,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4391390.0, ans=0.0 2024-08-19 09:05:33,570 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9050, loss[loss=0.1119, beats_loss=0.009203, ecapa_loss=0.0001493, whisper_loss=0.1012, over 20324.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.000142, whisper_loss=0.08987, over 3838484.58 frames. ], batch size: 77, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:05:40,467 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 09:06:03,729 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 09:06:11,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=4391690.0, ans=0.125 2024-08-19 09:06:15,894 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4391690.0, ans=0.125 2024-08-19 09:06:53,984 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9100, loss[loss=0.1216, beats_loss=0.008976, ecapa_loss=0.0001757, whisper_loss=0.1109, over 22486.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01037, ecapa_loss=0.0001437, whisper_loss=0.09053, over 3854322.30 frames. ], batch size: 89, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:06:59,070 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 09:07:10,932 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4392090.0, ans=0.125 2024-08-19 09:07:18,939 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 09:07:25,451 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 09:07:42,479 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.959e+01 2.265e+01 2.525e+01 2.711e+01 1.200e+02, threshold=5.050e+01, percent-clipped=1.0 2024-08-19 09:07:45,503 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.97 vs. limit=15.0 2024-08-19 09:07:50,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4392290.0, ans=0.1 2024-08-19 09:07:56,938 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 09:08:10,957 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 36 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 09:08:11,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4392390.0, ans=0.125 2024-08-19 09:08:13,353 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9150, loss[loss=0.09252, beats_loss=0.009364, ecapa_loss=0.0001511, whisper_loss=0.08165, over 18471.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001429, whisper_loss=0.09006, over 3878257.79 frames. ], batch size: 73, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:08:39,383 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.72 vs. limit=12.0 2024-08-19 09:08:56,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4392690.0, ans=0.0 2024-08-19 09:09:00,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4392790.0, ans=0.125 2024-08-19 09:09:08,990 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4392790.0, ans=0.0 2024-08-19 09:09:15,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4392890.0, ans=0.125 2024-08-19 09:09:15,031 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4392890.0, ans=0.04949747468305833 2024-08-19 09:09:23,203 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4392890.0, ans=0.125 2024-08-19 09:09:23,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4392890.0, ans=0.125 2024-08-19 09:09:28,369 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9200, loss[loss=0.09506, beats_loss=0.01061, ecapa_loss=0.0001206, whisper_loss=0.08324, over 18211.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.0104, ecapa_loss=0.0001429, whisper_loss=0.08985, over 3878454.28 frames. ], batch size: 70, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:09:37,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4392990.0, ans=0.2 2024-08-19 09:09:44,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4393090.0, ans=0.125 2024-08-19 09:09:47,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4393090.0, ans=0.125 2024-08-19 09:09:54,653 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 09:09:54,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4393090.0, ans=0.125 2024-08-19 09:10:00,778 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-19 09:10:06,183 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 23 from LS+wenet, 16 from Vox, 20 fro AS 2024-08-19 09:10:08,113 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4393190.0, ans=0.0 2024-08-19 09:10:11,935 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.633e+01 2.355e+01 2.575e+01 2.865e+01 1.533e+02, threshold=5.149e+01, percent-clipped=1.0 2024-08-19 09:10:26,998 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4393390.0, ans=0.07 2024-08-19 09:10:31,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4393390.0, ans=0.125 2024-08-19 09:10:35,449 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 09:10:40,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4393490.0, ans=0.0 2024-08-19 09:10:41,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9250, loss[loss=0.09642, beats_loss=0.01018, ecapa_loss=0.0001452, whisper_loss=0.08479, over 20641.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01043, ecapa_loss=0.0001435, whisper_loss=0.08969, over 3912193.35 frames. ], batch size: 83, lr: 2.04e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:10:42,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4393490.0, ans=0.0 2024-08-19 09:10:44,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4393490.0, ans=0.07 2024-08-19 09:10:55,045 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 32 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 09:11:02,436 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 24 from Vox, 46 fro AS 2024-08-19 09:11:06,998 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 09:11:07,957 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 09:11:27,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4393790.0, ans=0.125 2024-08-19 09:11:31,897 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2024-08-19 09:11:35,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4393790.0, ans=0.125 2024-08-19 09:11:55,209 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9300, loss[loss=0.125, beats_loss=0.008838, ecapa_loss=0.0001686, whisper_loss=0.1144, over 17134.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01042, ecapa_loss=0.0001437, whisper_loss=0.0895, over 3920160.33 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:11:55,931 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 09:12:02,554 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 15 from Vox, 46 fro AS 2024-08-19 09:12:08,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4394090.0, ans=0.1 2024-08-19 09:12:12,986 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 09:12:17,086 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 31 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 09:12:32,152 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 09:12:32,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4394190.0, ans=0.0 2024-08-19 09:12:37,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4394290.0, ans=0.0 2024-08-19 09:12:38,104 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.989e+01 2.432e+01 2.581e+01 2.836e+01 1.723e+02, threshold=5.163e+01, percent-clipped=2.0 2024-08-19 09:12:49,626 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 15 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 09:12:50,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4394290.0, ans=0.125 2024-08-19 09:12:50,482 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-08-19 09:12:52,273 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 34 from Vox, 31 fro AS 2024-08-19 09:12:57,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4394390.0, ans=0.2 2024-08-19 09:12:58,479 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4394390.0, ans=0.125 2024-08-19 09:13:02,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4394390.0, ans=0.125 2024-08-19 09:13:05,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=4394490.0, ans=0.125 2024-08-19 09:13:06,077 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9350, loss[loss=0.09133, beats_loss=0.01254, ecapa_loss=0.0001189, whisper_loss=0.07761, over 17780.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01048, ecapa_loss=0.0001438, whisper_loss=0.0888, over 3894337.46 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:13:26,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4394590.0, ans=0.0 2024-08-19 09:13:28,594 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 09:13:33,827 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 09:14:13,915 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9400, loss[loss=0.09358, beats_loss=0.01038, ecapa_loss=0.0001704, whisper_loss=0.0815, over 18403.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01049, ecapa_loss=0.0001424, whisper_loss=0.08891, over 3915174.84 frames. ], batch size: 75, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:14:21,767 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2024-08-19 09:14:22,273 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 25 from LS+wenet, 28 from Vox, 38 fro AS 2024-08-19 09:14:36,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4395090.0, ans=0.125 2024-08-19 09:14:39,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2024-08-19 09:14:50,809 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 09:14:54,365 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.883e+01 2.333e+01 2.565e+01 2.846e+01 4.265e+02, threshold=5.130e+01, percent-clipped=2.0 2024-08-19 09:15:02,505 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 18 from Vox, 22 fro AS 2024-08-19 09:15:12,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4395390.0, ans=0.125 2024-08-19 09:15:19,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9450, loss[loss=0.1224, beats_loss=0.009068, ecapa_loss=0.0001839, whisper_loss=0.1115, over 21879.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01044, ecapa_loss=0.0001429, whisper_loss=0.08911, over 3883038.78 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:15:20,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4395490.0, ans=0.035 2024-08-19 09:15:28,002 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 09:15:40,655 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.63 vs. limit=15.0 2024-08-19 09:15:43,970 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 09:15:44,284 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4395590.0, ans=0.025 2024-08-19 09:15:46,321 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 09:16:26,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9500, loss[loss=0.1128, beats_loss=0.01026, ecapa_loss=0.0001777, whisper_loss=0.1007, over 13252.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01044, ecapa_loss=0.0001417, whisper_loss=0.08937, over 3879989.13 frames. ], batch size: 55, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:16:33,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4395990.0, ans=0.125 2024-08-19 09:16:39,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2024-08-19 09:16:46,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4396090.0, ans=0.2 2024-08-19 09:16:47,999 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 09:17:06,172 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.770e+01 2.270e+01 2.577e+01 2.905e+01 4.057e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-19 09:17:06,654 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4396290.0, ans=0.125 2024-08-19 09:17:25,486 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4396390.0, ans=0.125 2024-08-19 09:17:27,615 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 21 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 09:17:32,744 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9550, loss[loss=0.107, beats_loss=0.009508, ecapa_loss=0.000158, whisper_loss=0.0959, over 15251.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001422, whisper_loss=0.08904, over 3908540.99 frames. ], batch size: 59, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:17:48,773 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.89 vs. limit=15.0 2024-08-19 09:17:52,093 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=4396590.0, ans=0.125 2024-08-19 09:18:13,572 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 24 from LS+wenet, 25 from Vox, 24 fro AS 2024-08-19 09:18:16,826 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.18 vs. limit=15.0 2024-08-19 09:18:37,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9600, loss[loss=0.1021, beats_loss=0.01131, ecapa_loss=0.0001343, whisper_loss=0.08944, over 22345.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01044, ecapa_loss=0.0001433, whisper_loss=0.08861, over 3841836.25 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:18:43,334 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 16 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 09:18:44,660 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 09:18:48,358 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 14 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 09:18:52,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4397090.0, ans=0.0 2024-08-19 09:19:00,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4397090.0, ans=0.125 2024-08-19 09:19:07,355 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 20 from LS+wenet, 16 from Vox, 26 fro AS 2024-08-19 09:19:15,186 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 17 from LS+wenet, 24 from Vox, 30 fro AS 2024-08-19 09:19:17,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4397290.0, ans=0.0 2024-08-19 09:19:17,720 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.332e+01 2.537e+01 2.796e+01 5.515e+01, threshold=5.073e+01, percent-clipped=1.0 2024-08-19 09:19:39,333 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4397390.0, ans=0.2 2024-08-19 09:19:44,240 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 23 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 09:19:47,039 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9650, loss[loss=0.106, beats_loss=0.009792, ecapa_loss=0.0001305, whisper_loss=0.09493, over 17236.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01035, ecapa_loss=0.0001428, whisper_loss=0.08942, over 3826862.04 frames. ], batch size: 67, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:19:56,748 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 09:20:07,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4397590.0, ans=0.125 2024-08-19 09:20:29,305 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 24 from Vox, 20 fro AS 2024-08-19 09:20:39,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4397790.0, ans=0.2 2024-08-19 09:20:57,064 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9700, loss[loss=0.1023, beats_loss=0.0106, ecapa_loss=0.0001832, whisper_loss=0.08983, over 17171.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0104, ecapa_loss=0.0001429, whisper_loss=0.0897, over 3823891.21 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:21:06,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4397990.0, ans=0.0 2024-08-19 09:21:11,495 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 09:21:22,405 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 24 from LS+wenet, 18 from Vox, 16 fro AS 2024-08-19 09:21:27,078 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2024-08-19 09:21:32,208 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4398190.0, ans=0.125 2024-08-19 09:21:36,699 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 09:21:37,818 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.414e+01 2.657e+01 3.096e+01 1.946e+02, threshold=5.314e+01, percent-clipped=1.0 2024-08-19 09:21:40,219 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.96 vs. limit=15.0 2024-08-19 09:21:41,394 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-08-19 09:21:46,916 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.31 vs. limit=8.0 2024-08-19 09:21:47,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4398290.0, ans=0.125 2024-08-19 09:22:04,339 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9750, loss[loss=0.1053, beats_loss=0.009408, ecapa_loss=0.0001283, whisper_loss=0.0946, over 21802.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01034, ecapa_loss=0.0001433, whisper_loss=0.08972, over 3817909.31 frames. ], batch size: 86, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:22:37,971 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 09:23:00,161 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4398890.0, ans=0.0 2024-08-19 09:23:02,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4398890.0, ans=0.05 2024-08-19 09:23:08,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9800, loss[loss=0.1014, beats_loss=0.01113, ecapa_loss=0.000144, whisper_loss=0.08883, over 22803.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01045, ecapa_loss=0.0001428, whisper_loss=0.08948, over 3839205.47 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:23:13,017 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4398990.0, ans=0.125 2024-08-19 09:23:32,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4399090.0, ans=0.1 2024-08-19 09:23:34,610 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.14 vs. limit=15.0 2024-08-19 09:23:36,431 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 09:23:47,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-19 09:23:47,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.301e+01 2.526e+01 2.757e+01 3.952e+01, threshold=5.052e+01, percent-clipped=0.0 2024-08-19 09:23:58,925 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4399290.0, ans=0.1 2024-08-19 09:24:13,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9850, loss[loss=0.07314, beats_loss=0.01357, ecapa_loss=0.0001229, whisper_loss=0.05834, over 19430.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01047, ecapa_loss=0.0001417, whisper_loss=0.0896, over 3864586.60 frames. ], batch size: 78, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 09:24:27,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4399590.0, ans=0.125 2024-08-19 09:24:28,558 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 16 from Vox, 42 fro AS 2024-08-19 09:24:41,197 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=5.327e+01 2024-08-19 09:24:43,208 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 09:24:55,433 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:25:05,999 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 09:25:16,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4399990.0, ans=0.1 2024-08-19 09:25:17,628 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9900, loss[loss=0.09811, beats_loss=0.01273, ecapa_loss=0.0001505, whisper_loss=0.08387, over 21334.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001422, whisper_loss=0.08964, over 3874146.99 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:25:21,680 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 09:25:44,209 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 37 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 09:25:50,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2024-08-19 09:25:52,136 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 09:25:56,063 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 09:25:59,912 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.262e+01 2.560e+01 2.824e+01 4.177e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 09:26:00,318 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 09:26:02,558 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 23 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 09:26:18,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4400390.0, ans=0.0 2024-08-19 09:26:19,675 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 09:26:25,644 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 9950, loss[loss=0.09697, beats_loss=0.009772, ecapa_loss=0.0001399, whisper_loss=0.0858, over 21503.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.000143, whisper_loss=0.08957, over 3877443.70 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:26:38,719 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.530e-02 2024-08-19 09:26:40,844 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 09:26:49,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2024-08-19 09:26:57,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=4400690.0, ans=0.125 2024-08-19 09:27:14,149 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 31 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-19 09:27:21,880 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2024-08-19 09:27:25,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=4400890.0, ans=0.04949747468305833 2024-08-19 09:27:26,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4400890.0, ans=0.1 2024-08-19 09:27:28,306 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2024-08-19 09:27:32,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10000, loss[loss=0.1131, beats_loss=0.008956, ecapa_loss=0.0001603, whisper_loss=0.1026, over 17209.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01049, ecapa_loss=0.0001421, whisper_loss=0.08962, over 3872765.20 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:27:39,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4400990.0, ans=0.2 2024-08-19 09:27:41,415 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 24 from Vox, 29 fro AS 2024-08-19 09:27:55,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4401090.0, ans=0.0 2024-08-19 09:28:04,888 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4401190.0, ans=0.125 2024-08-19 09:28:06,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4401190.0, ans=0.2 2024-08-19 09:28:13,419 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.203e+01 2.418e+01 2.701e+01 3.828e+01, threshold=4.836e+01, percent-clipped=0.0 2024-08-19 09:28:17,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4401290.0, ans=0.2 2024-08-19 09:28:17,171 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4401290.0, ans=0.1 2024-08-19 09:28:18,318 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 25 from Vox, 41 fro AS 2024-08-19 09:28:18,548 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4401290.0, ans=0.0 2024-08-19 09:28:37,476 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 23 from Vox, 45 fro AS 2024-08-19 09:28:40,093 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10050, loss[loss=0.06885, beats_loss=0.01239, ecapa_loss=0.0001375, whisper_loss=0.05508, over 14419.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001411, whisper_loss=0.08938, over 3892171.54 frames. ], batch size: 56, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:28:43,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4401490.0, ans=0.125 2024-08-19 09:28:47,247 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4401490.0, ans=0.2 2024-08-19 09:28:49,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4401490.0, ans=0.0 2024-08-19 09:29:05,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4401590.0, ans=0.0 2024-08-19 09:29:18,442 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 09:29:27,865 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 22 from LS+wenet, 19 from Vox, 27 fro AS 2024-08-19 09:29:28,258 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=4.318e+01 2024-08-19 09:29:35,317 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 32 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-19 09:29:36,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4401890.0, ans=0.125 2024-08-19 09:29:45,535 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10100, loss[loss=0.1195, beats_loss=0.009459, ecapa_loss=0.000156, whisper_loss=0.1084, over 22714.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.000143, whisper_loss=0.08953, over 3921461.35 frames. ], batch size: 94, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:29:57,462 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 29 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 09:30:03,167 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4402090.0, ans=0.0 2024-08-19 09:30:10,801 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4402190.0, ans=0.0 2024-08-19 09:30:27,049 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.294e+01 2.554e+01 2.790e+01 3.607e+01, threshold=5.108e+01, percent-clipped=0.0 2024-08-19 09:30:28,426 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4402290.0, ans=0.0 2024-08-19 09:30:31,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4402290.0, ans=0.125 2024-08-19 09:30:33,179 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 09:30:38,627 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=4402290.0, ans=22.5 2024-08-19 09:30:57,293 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 09:30:58,393 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10150, loss[loss=0.1029, beats_loss=0.01139, ecapa_loss=0.0001272, whisper_loss=0.09022, over 23128.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01051, ecapa_loss=0.0001427, whisper_loss=0.08988, over 3927533.37 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:31:27,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4402690.0, ans=0.125 2024-08-19 09:31:34,162 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 22 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 09:31:35,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4402690.0, ans=0.0 2024-08-19 09:31:40,923 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 09:31:44,165 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 09:32:00,191 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 09:32:12,448 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10200, loss[loss=0.1009, beats_loss=0.01073, ecapa_loss=0.000178, whisper_loss=0.0884, over 21098.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01048, ecapa_loss=0.0001434, whisper_loss=0.08966, over 3925161.63 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:32:30,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4403090.0, ans=0.125 2024-08-19 09:32:40,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4403090.0, ans=0.125 2024-08-19 09:32:40,983 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 31 from Vox, 29 fro AS 2024-08-19 09:32:41,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=15.0 2024-08-19 09:32:54,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4403190.0, ans=0.0 2024-08-19 09:32:56,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.913e+01 2.315e+01 2.558e+01 2.832e+01 4.132e+01, threshold=5.117e+01, percent-clipped=0.0 2024-08-19 09:32:59,158 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 09:32:59,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=22.5 2024-08-19 09:33:09,390 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 09:33:25,362 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10250, loss[loss=0.08182, beats_loss=0.01329, ecapa_loss=0.0001197, whisper_loss=0.06733, over 14334.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01041, ecapa_loss=0.0001436, whisper_loss=0.08983, over 3907482.82 frames. ], batch size: 57, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:33:37,777 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 22 from Vox, 37 fro AS 2024-08-19 09:33:54,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4403590.0, ans=0.125 2024-08-19 09:34:23,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4403790.0, ans=0.125 2024-08-19 09:34:46,035 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 34 from LS+wenet, 11 from Vox, 16 fro AS 2024-08-19 09:34:50,232 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10300, loss[loss=0.08968, beats_loss=0.01071, ecapa_loss=0.0001186, whisper_loss=0.07779, over 19695.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01047, ecapa_loss=0.0001428, whisper_loss=0.08938, over 3920975.14 frames. ], batch size: 78, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:34:53,109 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 29 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 09:35:19,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4404090.0, ans=0.2 2024-08-19 09:35:42,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.884e+01 2.424e+01 2.703e+01 3.011e+01 5.965e+01, threshold=5.405e+01, percent-clipped=1.0 2024-08-19 09:35:42,459 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 25 from Vox, 23 fro AS 2024-08-19 09:35:47,840 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 24 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 09:35:51,859 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 24 from Vox, 28 fro AS 2024-08-19 09:35:59,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.65 vs. limit=10.0 2024-08-19 09:36:13,106 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 09:36:20,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10350, loss[loss=0.1153, beats_loss=0.009276, ecapa_loss=0.0001432, whisper_loss=0.1046, over 22155.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01056, ecapa_loss=0.000143, whisper_loss=0.08857, over 3890977.70 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:36:26,660 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4404490.0, ans=0.07 2024-08-19 09:36:31,567 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2024-08-19 09:36:32,167 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 20 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 09:36:34,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4404490.0, ans=0.125 2024-08-19 09:36:46,905 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 14 from Vox, 24 fro AS 2024-08-19 09:36:50,828 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-08-19 09:37:08,003 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.61 vs. limit=22.5 2024-08-19 09:37:14,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4404790.0, ans=0.125 2024-08-19 09:37:18,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4404790.0, ans=0.1 2024-08-19 09:37:21,877 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4404790.0, ans=0.125 2024-08-19 09:37:34,143 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.74 vs. limit=6.0 2024-08-19 09:37:52,728 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10400, loss[loss=0.1034, beats_loss=0.01127, ecapa_loss=0.0001171, whisper_loss=0.091, over 16446.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.0105, ecapa_loss=0.0001426, whisper_loss=0.08881, over 3879921.03 frames. ], batch size: 64, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:37:58,178 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 09:38:01,313 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 25 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 09:38:03,605 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 23 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 09:38:03,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4404990.0, ans=0.0 2024-08-19 09:38:20,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4405090.0, ans=0.2 2024-08-19 09:38:26,076 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4405090.0, ans=0.0 2024-08-19 09:38:38,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.94 vs. limit=5.0 2024-08-19 09:38:42,649 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 16 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 09:38:46,998 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.316e+01 2.550e+01 2.840e+01 5.090e+01, threshold=5.101e+01, percent-clipped=0.0 2024-08-19 09:38:51,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=4405290.0, ans=0.2 2024-08-19 09:38:53,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4405290.0, ans=0.0 2024-08-19 09:39:07,951 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4405390.0, ans=0.0 2024-08-19 09:39:13,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10450, loss[loss=0.1037, beats_loss=0.01174, ecapa_loss=0.0001368, whisper_loss=0.09058, over 18627.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01051, ecapa_loss=0.0001434, whisper_loss=0.08908, over 3853761.33 frames. ], batch size: 75, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:39:15,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4405490.0, ans=0.125 2024-08-19 09:39:20,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4405490.0, ans=0.0 2024-08-19 09:39:28,208 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 17 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 09:39:49,080 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 09:39:53,477 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4405690.0, ans=0.1 2024-08-19 09:40:03,008 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 09:40:03,731 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-08-19 09:40:05,634 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 09:40:08,205 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 09:40:23,383 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10500, loss[loss=0.09116, beats_loss=0.00941, ecapa_loss=0.0001633, whisper_loss=0.08012, over 17052.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001433, whisper_loss=0.08964, over 3884472.58 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:40:28,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4405990.0, ans=0.125 2024-08-19 09:40:31,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4405990.0, ans=0.125 2024-08-19 09:40:34,301 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 22 from LS+wenet, 12 from Vox, 33 fro AS 2024-08-19 09:40:52,511 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=15.0 2024-08-19 09:41:02,925 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.686e+01 2.206e+01 2.360e+01 2.626e+01 1.632e+02, threshold=4.720e+01, percent-clipped=1.0 2024-08-19 09:41:20,085 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 16 from Vox, 45 fro AS 2024-08-19 09:41:25,899 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 09:41:29,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10550, loss[loss=0.09633, beats_loss=0.01099, ecapa_loss=0.0001478, whisper_loss=0.08386, over 23859.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01044, ecapa_loss=0.000144, whisper_loss=0.0894, over 3852188.96 frames. ], batch size: 96, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:41:36,955 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=4406490.0, ans=0.2 2024-08-19 09:41:45,525 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 09:41:47,167 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 17 from Vox, 22 fro AS 2024-08-19 09:41:50,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.50 vs. limit=22.5 2024-08-19 09:42:02,447 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 18 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 09:42:17,088 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 09:42:18,621 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 35 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 09:42:20,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-08-19 09:42:22,226 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 19 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 09:42:38,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4406890.0, ans=0.125 2024-08-19 09:42:39,741 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4406990.0, ans=0.125 2024-08-19 09:42:40,598 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10600, loss[loss=0.07649, beats_loss=0.0146, ecapa_loss=0.000144, whisper_loss=0.06045, over 18669.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0104, ecapa_loss=0.0001445, whisper_loss=0.08915, over 3854225.67 frames. ], batch size: 79, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:42:56,468 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4407090.0, ans=0.125 2024-08-19 09:43:12,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4407190.0, ans=0.125 2024-08-19 09:43:20,422 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4407190.0, ans=0.0 2024-08-19 09:43:22,873 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.874e+01 2.325e+01 2.532e+01 2.911e+01 7.295e+01, threshold=5.064e+01, percent-clipped=1.0 2024-08-19 09:43:31,257 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4407290.0, ans=0.1 2024-08-19 09:43:48,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4407390.0, ans=0.0 2024-08-19 09:43:50,719 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10650, loss[loss=0.09782, beats_loss=0.01016, ecapa_loss=0.0001267, whisper_loss=0.08639, over 21660.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01043, ecapa_loss=0.0001429, whisper_loss=0.0888, over 3848985.06 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:44:01,256 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 09:44:09,840 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 25 from Vox, 40 fro AS 2024-08-19 09:44:19,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4407690.0, ans=0.125 2024-08-19 09:44:40,135 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 09:44:42,804 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 09:44:44,415 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 29 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 09:44:58,666 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 09:44:59,449 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=12.0 2024-08-19 09:45:02,608 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10700, loss[loss=0.1175, beats_loss=0.01093, ecapa_loss=0.0001227, whisper_loss=0.1054, over 22545.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01046, ecapa_loss=0.0001425, whisper_loss=0.08917, over 3908895.60 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:45:21,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2024-08-19 09:45:43,426 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.961e+01 2.323e+01 2.559e+01 2.783e+01 4.084e+01, threshold=5.118e+01, percent-clipped=0.0 2024-08-19 09:45:51,296 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 30 from Vox, 21 fro AS 2024-08-19 09:46:00,769 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 09:46:09,981 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10750, loss[loss=0.1045, beats_loss=0.007771, ecapa_loss=0.0001486, whisper_loss=0.09524, over 14862.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01051, ecapa_loss=0.0001423, whisper_loss=0.08925, over 3909988.00 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:46:14,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4408490.0, ans=0.125 2024-08-19 09:46:19,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4408490.0, ans=0.125 2024-08-19 09:46:36,662 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 12 from LS+wenet, 19 from Vox, 22 fro AS 2024-08-19 09:46:46,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4408690.0, ans=0.0 2024-08-19 09:46:52,925 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 29 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 09:47:10,601 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=4408890.0, ans=0.09899494936611666 2024-08-19 09:47:14,123 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10800, loss[loss=0.1049, beats_loss=0.01154, ecapa_loss=0.0001104, whisper_loss=0.09223, over 14547.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001414, whisper_loss=0.0895, over 3899511.55 frames. ], batch size: 54, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:47:14,273 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 14 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 09:47:17,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4408990.0, ans=0.1 2024-08-19 09:47:23,646 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-08-19 09:47:28,608 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 09:47:37,377 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.76 vs. limit=15.0 2024-08-19 09:47:45,929 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4409190.0, ans=0.0 2024-08-19 09:47:50,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=4409290.0, ans=0.07 2024-08-19 09:47:51,572 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.788e+01 2.386e+01 2.632e+01 3.001e+01 4.725e+01, threshold=5.265e+01, percent-clipped=0.0 2024-08-19 09:48:01,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4409290.0, ans=0.1 2024-08-19 09:48:01,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2024-08-19 09:48:03,186 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 09:48:04,148 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.57 vs. limit=22.5 2024-08-19 09:48:05,420 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4409390.0, ans=0.0 2024-08-19 09:48:08,401 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4409390.0, ans=0.0 2024-08-19 09:48:09,326 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 16 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 09:48:17,658 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10850, loss[loss=0.08914, beats_loss=0.01022, ecapa_loss=0.000157, whisper_loss=0.07736, over 17668.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01055, ecapa_loss=0.000142, whisper_loss=0.08918, over 3878255.53 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:48:24,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4409490.0, ans=0.0 2024-08-19 09:48:24,800 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4409490.0, ans=0.0 2024-08-19 09:48:27,342 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4409490.0, ans=0.1 2024-08-19 09:48:31,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4409590.0, ans=0.125 2024-08-19 09:48:47,865 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4409690.0, ans=0.125 2024-08-19 09:48:53,200 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2024-08-19 09:49:06,076 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 09:49:14,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4409890.0, ans=0.125 2024-08-19 09:49:17,587 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 09:49:21,259 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10900, loss[loss=0.1133, beats_loss=0.01034, ecapa_loss=0.0001517, whisper_loss=0.1015, over 20890.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01053, ecapa_loss=0.000142, whisper_loss=0.08942, over 3893477.65 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:49:32,951 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 26 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 09:49:38,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4410090.0, ans=0.125 2024-08-19 09:49:59,520 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.779e+01 2.400e+01 2.615e+01 2.977e+01 1.064e+02, threshold=5.230e+01, percent-clipped=2.0 2024-08-19 09:50:00,517 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2024-08-19 09:50:06,602 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.41 vs. limit=10.0 2024-08-19 09:50:19,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4410390.0, ans=0.0 2024-08-19 09:50:19,533 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.73 vs. limit=10.0 2024-08-19 09:50:21,383 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 21 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 09:50:21,694 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4410390.0, ans=0.125 2024-08-19 09:50:22,598 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 23 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 09:50:25,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 10950, loss[loss=0.1028, beats_loss=0.01092, ecapa_loss=0.0001244, whisper_loss=0.09063, over 17235.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01051, ecapa_loss=0.0001422, whisper_loss=0.09023, over 3899204.07 frames. ], batch size: 67, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:50:40,774 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2024-08-19 09:50:53,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4410690.0, ans=0.0 2024-08-19 09:51:04,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4410790.0, ans=0.0 2024-08-19 09:51:26,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4410890.0, ans=0.05 2024-08-19 09:51:30,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11000, loss[loss=0.1046, beats_loss=0.008648, ecapa_loss=0.0001328, whisper_loss=0.0946, over 21768.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0105, ecapa_loss=0.0001425, whisper_loss=0.08948, over 3925754.40 frames. ], batch size: 84, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:51:32,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.59 vs. limit=10.0 2024-08-19 09:51:33,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4410990.0, ans=0.0 2024-08-19 09:51:48,185 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 09:51:51,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2024-08-19 09:51:56,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4411090.0, ans=0.125 2024-08-19 09:52:07,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4411190.0, ans=0.0 2024-08-19 09:52:12,344 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.368e+01 2.497e+01 2.768e+01 3.279e+02, threshold=4.993e+01, percent-clipped=1.0 2024-08-19 09:52:20,017 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2024-08-19 09:52:27,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4411390.0, ans=0.1 2024-08-19 09:52:38,971 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11050, loss[loss=0.1023, beats_loss=0.01063, ecapa_loss=0.000174, whisper_loss=0.08995, over 21029.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01051, ecapa_loss=0.0001437, whisper_loss=0.08998, over 3912158.32 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:52:41,711 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 29 from LS+wenet, 30 from Vox, 35 fro AS 2024-08-19 09:52:48,263 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 20 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 09:52:53,583 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 24 from Vox, 22 fro AS 2024-08-19 09:52:55,181 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4411590.0, ans=0.0 2024-08-19 09:53:04,294 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4411690.0, ans=0.125 2024-08-19 09:53:16,997 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 25 from LS+wenet, 35 from Vox, 35 fro AS 2024-08-19 09:53:26,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4411790.0, ans=0.0 2024-08-19 09:53:38,230 WARNING [optim.py:496] (3/4) Scaling gradients by 0.06586580723524094, model_norm_threshold=49.93263626098633 2024-08-19 09:53:38,407 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.23, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.337e+05, grad_sumsq=1.337e+05, orig_rms_sq=1.000e+00 2024-08-19 09:53:38,532 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 20 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 09:53:45,249 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11100, loss[loss=0.1011, beats_loss=0.01018, ecapa_loss=0.00013, whisper_loss=0.08958, over 19309.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001434, whisper_loss=0.09049, over 3917872.16 frames. ], batch size: 73, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:53:56,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4411990.0, ans=0.2 2024-08-19 09:54:05,277 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4412090.0, ans=0.125 2024-08-19 09:54:28,471 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.940e+01 2.398e+01 2.719e+01 3.101e+01 7.581e+02, threshold=5.438e+01, percent-clipped=4.0 2024-08-19 09:54:38,036 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4412290.0, ans=0.125 2024-08-19 09:54:40,355 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 09:54:47,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4412390.0, ans=0.125 2024-08-19 09:54:48,343 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4412390.0, ans=0.0 2024-08-19 09:54:54,243 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 18 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 09:54:58,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11150, loss[loss=0.09814, beats_loss=0.01096, ecapa_loss=0.000142, whisper_loss=0.08576, over 22099.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.09052, over 3942570.00 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:55:09,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4412490.0, ans=0.2 2024-08-19 09:55:13,628 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 13 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 09:55:34,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4412690.0, ans=0.0 2024-08-19 09:55:36,937 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4412690.0, ans=0.125 2024-08-19 09:55:36,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4412690.0, ans=0.125 2024-08-19 09:55:55,869 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 25 from LS+wenet, 20 from Vox, 29 fro AS 2024-08-19 09:55:57,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4412890.0, ans=0.125 2024-08-19 09:56:09,786 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11200, loss[loss=0.1164, beats_loss=0.008703, ecapa_loss=0.0001469, whisper_loss=0.1062, over 19243.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01029, ecapa_loss=0.0001423, whisper_loss=0.09152, over 3948529.00 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:56:11,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4412990.0, ans=0.0 2024-08-19 09:56:14,202 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-08-19 09:56:22,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4413090.0, ans=0.1 2024-08-19 09:56:44,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4413190.0, ans=0.2 2024-08-19 09:56:45,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2024-08-19 09:56:46,577 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.61 vs. limit=10.0 2024-08-19 09:56:50,877 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.919e+01 2.289e+01 2.521e+01 2.778e+01 3.744e+01, threshold=5.043e+01, percent-clipped=0.0 2024-08-19 09:56:51,628 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2024-08-19 09:57:09,202 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4413390.0, ans=0.0 2024-08-19 09:57:14,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4413390.0, ans=0.1 2024-08-19 09:57:20,484 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11250, loss[loss=0.08994, beats_loss=0.01117, ecapa_loss=0.0001513, whisper_loss=0.07726, over 22010.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01029, ecapa_loss=0.0001423, whisper_loss=0.0912, over 3932387.87 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:57:23,172 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 09:57:24,910 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 09:57:35,780 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=15.0 2024-08-19 09:57:36,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4413590.0, ans=0.125 2024-08-19 09:57:51,956 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4413690.0, ans=0.1 2024-08-19 09:58:07,958 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2024-08-19 09:58:22,656 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 10 from Vox, 25 fro AS 2024-08-19 09:58:24,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=4413890.0, ans=0.5 2024-08-19 09:58:28,596 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11300, loss[loss=0.1043, beats_loss=0.008008, ecapa_loss=0.0002, whisper_loss=0.09425, over 18539.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01028, ecapa_loss=0.0001426, whisper_loss=0.0912, over 3909649.75 frames. ], batch size: 81, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:58:29,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4413990.0, ans=0.2 2024-08-19 09:58:30,333 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 21 from Vox, 19 fro AS 2024-08-19 09:58:35,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-08-19 09:58:51,351 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 20 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 09:58:51,770 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.87 vs. limit=10.0 2024-08-19 09:58:55,300 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4414190.0, ans=0.0 2024-08-19 09:58:56,889 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-08-19 09:59:04,260 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4414190.0, ans=0.125 2024-08-19 09:59:08,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.325e+01 2.531e+01 2.748e+01 4.040e+01, threshold=5.062e+01, percent-clipped=0.0 2024-08-19 09:59:12,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4414290.0, ans=0.1 2024-08-19 09:59:19,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4414290.0, ans=0.125 2024-08-19 09:59:34,833 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11350, loss[loss=0.1049, beats_loss=0.01019, ecapa_loss=0.0001725, whisper_loss=0.09296, over 21797.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01024, ecapa_loss=0.0001435, whisper_loss=0.09089, over 3879416.39 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 09:59:36,460 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 23 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 09:59:41,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4414490.0, ans=0.125 2024-08-19 09:59:46,605 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4414590.0, ans=10.0 2024-08-19 09:59:51,749 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4414590.0, ans=0.125 2024-08-19 10:00:00,873 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=15.0 2024-08-19 10:00:18,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4414790.0, ans=0.125 2024-08-19 10:00:24,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4414890.0, ans=10.0 2024-08-19 10:00:25,007 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.74 vs. limit=15.0 2024-08-19 10:00:28,329 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.00 vs. limit=10.0 2024-08-19 10:00:37,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11400, loss[loss=0.1157, beats_loss=0.01153, ecapa_loss=0.0001438, whisper_loss=0.1027, over 19501.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01026, ecapa_loss=0.0001433, whisper_loss=0.09119, over 3883047.44 frames. ], batch size: 76, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:00:38,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.39 vs. limit=22.5 2024-08-19 10:00:48,144 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:00:58,891 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 23 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 10:01:10,716 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.380e+00 2024-08-19 10:01:13,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4415190.0, ans=0.07 2024-08-19 10:01:15,084 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.359e+01 2.595e+01 2.973e+01 3.861e+01, threshold=5.190e+01, percent-clipped=0.0 2024-08-19 10:01:24,490 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.89 vs. limit=6.0 2024-08-19 10:01:39,602 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11450, loss[loss=0.1084, beats_loss=0.01099, ecapa_loss=0.0001306, whisper_loss=0.0961, over 21369.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01028, ecapa_loss=0.0001432, whisper_loss=0.09123, over 3898732.28 frames. ], batch size: 86, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:01:45,100 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.817e-02 2024-08-19 10:01:55,220 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4415590.0, ans=0.0 2024-08-19 10:01:59,887 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 10:02:13,663 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4415690.0, ans=0.0 2024-08-19 10:02:17,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4415790.0, ans=0.125 2024-08-19 10:02:23,488 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 17 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 10:02:38,421 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4415890.0, ans=0.125 2024-08-19 10:02:41,947 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11500, loss[loss=0.1297, beats_loss=0.007563, ecapa_loss=0.0001462, whisper_loss=0.1207, over 18462.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001436, whisper_loss=0.0911, over 3905568.43 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:02:42,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4415990.0, ans=0.125 2024-08-19 10:02:52,269 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4415990.0, ans=0.0 2024-08-19 10:03:18,813 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.451e+01 2.614e+01 2.882e+01 3.813e+01, threshold=5.229e+01, percent-clipped=0.0 2024-08-19 10:03:21,984 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 10:03:40,067 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 25 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 10:03:41,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4416390.0, ans=10.0 2024-08-19 10:03:42,701 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 10:03:43,769 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11550, loss[loss=0.09894, beats_loss=0.01056, ecapa_loss=0.0001794, whisper_loss=0.08659, over 15731.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01028, ecapa_loss=0.0001443, whisper_loss=0.09069, over 3890655.95 frames. ], batch size: 67, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:03:57,186 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 10:04:04,164 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.84 vs. limit=15.0 2024-08-19 10:04:04,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4416590.0, ans=0.125 2024-08-19 10:04:09,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4416690.0, ans=0.2 2024-08-19 10:04:39,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4416890.0, ans=0.125 2024-08-19 10:04:45,843 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11600, loss[loss=0.107, beats_loss=0.01104, ecapa_loss=0.0001492, whisper_loss=0.09447, over 20949.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01031, ecapa_loss=0.0001439, whisper_loss=0.09051, over 3887224.12 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 5.764607523034235e+17 2024-08-19 10:04:47,693 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2024-08-19 10:04:47,866 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.11 vs. limit=15.0 2024-08-19 10:04:58,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4417090.0, ans=0.0 2024-08-19 10:05:02,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4417090.0, ans=0.125 2024-08-19 10:05:07,520 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 10:05:17,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4417190.0, ans=0.2 2024-08-19 10:05:22,511 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.876e+01 2.308e+01 2.624e+01 2.875e+01 7.013e+01, threshold=5.249e+01, percent-clipped=1.0 2024-08-19 10:05:26,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4417290.0, ans=0.0 2024-08-19 10:05:31,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4417290.0, ans=0.125 2024-08-19 10:05:40,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4417390.0, ans=0.125 2024-08-19 10:05:43,650 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4417390.0, ans=0.2 2024-08-19 10:05:44,654 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 10:05:48,083 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11650, loss[loss=0.09183, beats_loss=0.01153, ecapa_loss=0.0001225, whisper_loss=0.07908, over 19396.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01045, ecapa_loss=0.0001434, whisper_loss=0.08982, over 3902088.12 frames. ], batch size: 80, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:06:00,613 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 10:06:36,376 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2024-08-19 10:06:48,101 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 10:06:50,599 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11700, loss[loss=0.09716, beats_loss=0.009818, ecapa_loss=0.0001452, whisper_loss=0.08589, over 23072.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001433, whisper_loss=0.08957, over 3900420.69 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:06:53,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4417990.0, ans=0.125 2024-08-19 10:07:11,745 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4418090.0, ans=0.0 2024-08-19 10:07:27,002 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 23 from LS+wenet, 16 from Vox, 46 fro AS 2024-08-19 10:07:29,312 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.361e+01 2.643e+01 2.921e+01 4.842e+01, threshold=5.287e+01, percent-clipped=0.0 2024-08-19 10:07:39,491 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4418390.0, ans=0.125 2024-08-19 10:07:45,379 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 10:07:49,114 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 10:07:52,540 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11750, loss[loss=0.09764, beats_loss=0.007418, ecapa_loss=0.0001741, whisper_loss=0.08848, over 17884.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01062, ecapa_loss=0.0001433, whisper_loss=0.08935, over 3910912.66 frames. ], batch size: 73, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:08:04,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4418590.0, ans=0.125 2024-08-19 10:08:07,550 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 23 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 10:08:21,745 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2024-08-19 10:08:25,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4418690.0, ans=0.0 2024-08-19 10:08:33,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4418790.0, ans=0.0 2024-08-19 10:08:37,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4418790.0, ans=0.1 2024-08-19 10:08:43,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4418890.0, ans=0.125 2024-08-19 10:08:53,910 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11800, loss[loss=0.1018, beats_loss=0.01173, ecapa_loss=0.0001209, whisper_loss=0.08889, over 22351.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01064, ecapa_loss=0.0001423, whisper_loss=0.0893, over 3906367.92 frames. ], batch size: 90, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:08:54,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4418990.0, ans=0.0 2024-08-19 10:08:58,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4418990.0, ans=0.1 2024-08-19 10:09:13,976 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 30 from LS+wenet, 20 from Vox, 44 fro AS 2024-08-19 10:09:19,351 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4419190.0, ans=0.0 2024-08-19 10:09:23,975 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 19 from LS+wenet, 22 from Vox, 54 fro AS 2024-08-19 10:09:32,070 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.601e+01 2.354e+01 2.552e+01 2.696e+01 6.400e+01, threshold=5.104e+01, percent-clipped=1.0 2024-08-19 10:09:34,037 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4419290.0, ans=0.125 2024-08-19 10:09:36,481 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4419290.0, ans=0.125 2024-08-19 10:09:39,119 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4419290.0, ans=0.125 2024-08-19 10:09:41,367 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4419290.0, ans=0.125 2024-08-19 10:09:55,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11850, loss[loss=0.1149, beats_loss=0.01076, ecapa_loss=0.000148, whisper_loss=0.1026, over 23164.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01072, ecapa_loss=0.0001413, whisper_loss=0.08866, over 3874898.25 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:09:56,091 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 10:10:01,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4419490.0, ans=0.125 2024-08-19 10:10:03,459 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4419490.0, ans=0.0 2024-08-19 10:10:05,694 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 17 from Vox, 48 fro AS 2024-08-19 10:10:16,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4419590.0, ans=0.95 2024-08-19 10:10:22,417 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4419690.0, ans=0.125 2024-08-19 10:10:49,380 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 22 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 10:10:53,763 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4419890.0, ans=0.0 2024-08-19 10:10:58,361 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11900, loss[loss=0.09271, beats_loss=0.01403, ecapa_loss=0.0001062, whisper_loss=0.07762, over 22129.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.0106, ecapa_loss=0.0001417, whisper_loss=0.08938, over 3883767.76 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:11:14,949 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4420090.0, ans=0.125 2024-08-19 10:11:17,149 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 23 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 10:11:24,461 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 10:11:36,736 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.653e+01 2.301e+01 2.576e+01 2.913e+01 4.968e+01, threshold=5.153e+01, percent-clipped=0.0 2024-08-19 10:11:37,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4420290.0, ans=0.07 2024-08-19 10:11:46,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2024-08-19 10:11:49,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4420390.0, ans=0.125 2024-08-19 10:11:52,237 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4420390.0, ans=0.0 2024-08-19 10:11:54,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4420390.0, ans=0.125 2024-08-19 10:12:00,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 11950, loss[loss=0.121, beats_loss=0.007163, ecapa_loss=0.0001881, whisper_loss=0.1119, over 15734.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01055, ecapa_loss=0.0001414, whisper_loss=0.08999, over 3893135.28 frames. ], batch size: 63, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:12:01,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4420490.0, ans=0.125 2024-08-19 10:12:02,485 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.429e+01 2024-08-19 10:12:02,664 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.65 vs. limit=6.0 2024-08-19 10:12:06,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4420490.0, ans=0.125 2024-08-19 10:12:09,035 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09744685143232346, model_norm_threshold=51.527191162109375 2024-08-19 10:12:09,201 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.20, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.481e+04, grad_sumsq=5.241e+06, orig_rms_sq=1.046e-02 2024-08-19 10:12:15,492 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.16 vs. limit=15.0 2024-08-19 10:12:17,053 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 16 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 10:12:22,003 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 38 from LS+wenet, 18 from Vox, 37 fro AS 2024-08-19 10:12:33,310 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 36 from LS+wenet, 15 from Vox, 38 fro AS 2024-08-19 10:12:52,124 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4420890.0, ans=0.0 2024-08-19 10:12:53,409 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4420890.0, ans=0.125 2024-08-19 10:12:54,143 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08515588939189911, model_norm_threshold=51.527191162109375 2024-08-19 10:12:54,306 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.out_norm.log_scale with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=4.362e+04, grad_sumsq=4.362e+04, orig_rms_sq=1.000e+00 2024-08-19 10:13:01,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2024-08-19 10:13:02,420 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.51 vs. limit=22.5 2024-08-19 10:13:02,909 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12000, loss[loss=0.0895, beats_loss=0.008944, ecapa_loss=0.0001332, whisper_loss=0.07922, over 17207.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01057, ecapa_loss=0.0001416, whisper_loss=0.08939, over 3895878.86 frames. ], batch size: 68, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:13:02,909 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 10:13:40,119 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on ASR_libri: loss=0.2551, beats_loss=0, ecapa_loss=0.0005098, whisper_loss=0.25, over 922467.00 frames. 2024-08-19 10:13:54,369 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7156, 2.6009, 2.6896, 2.5589], device='cuda:3') 2024-08-19 10:13:57,391 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on SV_voxceleb1: loss=0.003986, beats_loss=0, ecapa_loss=0.0003986, whisper_loss=0, over 939242.00 frames. 2024-08-19 10:14:59,792 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1762, 1.7731, 1.9411, 1.7801], device='cuda:3') 2024-08-19 10:15:38,042 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([0.0014, 0.0477, 0.0025, 0.0308, 0.0031, 0.0887, 0.0192, 0.0447], device='cuda:3') 2024-08-19 10:15:43,699 INFO [train_multi_KD3.py:1149] (3/4) Epoch 30, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 10:15:43,703 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 10:15:51,923 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-08-19 10:15:52,506 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 16 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 10:16:01,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4421090.0, ans=0.125 2024-08-19 10:16:12,686 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 13 from Vox, 26 fro AS 2024-08-19 10:16:17,200 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=4421190.0, ans=0.0 2024-08-19 10:16:22,829 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.555e+01 2.285e+01 2.523e+01 2.857e+01 6.051e+02, threshold=5.046e+01, percent-clipped=3.0 2024-08-19 10:16:34,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4421390.0, ans=0.125 2024-08-19 10:16:46,476 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12050, loss[loss=0.1444, beats_loss=0.006575, ecapa_loss=0.0001456, whisper_loss=0.1364, over 16751.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01058, ecapa_loss=0.0001405, whisper_loss=0.08913, over 3850659.90 frames. ], batch size: 62, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:16:48,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4421490.0, ans=0.0 2024-08-19 10:17:07,103 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 13 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 10:17:23,693 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 19 from LS+wenet, 25 from Vox, 25 fro AS 2024-08-19 10:17:28,808 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 21 from Vox, 24 fro AS 2024-08-19 10:17:31,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4421790.0, ans=0.125 2024-08-19 10:17:32,465 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 27 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 10:17:49,686 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12100, loss[loss=0.09852, beats_loss=0.01062, ecapa_loss=0.0001551, whisper_loss=0.08635, over 16255.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01048, ecapa_loss=0.0001413, whisper_loss=0.08978, over 3851958.76 frames. ], batch size: 67, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:18:01,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4422090.0, ans=0.1 2024-08-19 10:18:02,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4422090.0, ans=0.2 2024-08-19 10:18:06,271 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4422090.0, ans=0.125 2024-08-19 10:18:07,635 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4422090.0, ans=0.125 2024-08-19 10:18:08,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4422090.0, ans=0.125 2024-08-19 10:18:16,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4422190.0, ans=0.0 2024-08-19 10:18:28,089 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.819e+01 2.327e+01 2.569e+01 2.945e+01 4.765e+01, threshold=5.137e+01, percent-clipped=0.0 2024-08-19 10:18:32,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=4422290.0, ans=0.125 2024-08-19 10:18:41,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4422390.0, ans=0.125 2024-08-19 10:18:47,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4422390.0, ans=0.0 2024-08-19 10:18:51,762 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12150, loss[loss=0.1308, beats_loss=0.007697, ecapa_loss=0.0001661, whisper_loss=0.1215, over 19773.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01054, ecapa_loss=0.0001419, whisper_loss=0.08919, over 3849824.42 frames. ], batch size: 77, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:19:05,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4422590.0, ans=0.2 2024-08-19 10:19:05,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=4422590.0, ans=0.5 2024-08-19 10:19:14,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4422590.0, ans=0.125 2024-08-19 10:19:17,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4422690.0, ans=0.0 2024-08-19 10:19:21,767 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 10:19:22,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4422690.0, ans=10.0 2024-08-19 10:19:22,382 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.47 vs. limit=10.0 2024-08-19 10:19:28,383 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4422790.0, ans=0.125 2024-08-19 10:19:38,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4422790.0, ans=0.2 2024-08-19 10:19:38,659 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2024-08-19 10:19:46,070 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=4422890.0, ans=0.05 2024-08-19 10:19:54,292 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12200, loss[loss=0.1128, beats_loss=0.01069, ecapa_loss=0.0001289, whisper_loss=0.1009, over 20215.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01051, ecapa_loss=0.0001415, whisper_loss=0.08949, over 3851479.79 frames. ], batch size: 79, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:20:27,796 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 26 from Vox, 37 fro AS 2024-08-19 10:20:31,835 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4423290.0, ans=0.125 2024-08-19 10:20:32,464 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.833e+01 2.316e+01 2.610e+01 2.989e+01 7.361e+01, threshold=5.220e+01, percent-clipped=1.0 2024-08-19 10:20:33,854 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 24 from LS+wenet, 27 from Vox, 44 fro AS 2024-08-19 10:20:36,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=4423290.0, ans=10.0 2024-08-19 10:20:49,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4423390.0, ans=0.0 2024-08-19 10:20:52,414 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 20 from LS+wenet, 18 from Vox, 39 fro AS 2024-08-19 10:20:55,997 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12250, loss[loss=0.1101, beats_loss=0.01008, ecapa_loss=0.0001744, whisper_loss=0.09833, over 17449.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.08947, over 3842696.95 frames. ], batch size: 70, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:20:56,167 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 17 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 10:21:01,034 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 10:21:05,064 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=15.0 2024-08-19 10:21:19,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4423690.0, ans=0.125 2024-08-19 10:21:26,179 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4423690.0, ans=0.0 2024-08-19 10:21:29,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4423690.0, ans=0.04949747468305833 2024-08-19 10:21:30,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4423690.0, ans=0.125 2024-08-19 10:21:32,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4423790.0, ans=0.125 2024-08-19 10:21:45,058 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4423890.0, ans=0.125 2024-08-19 10:21:47,474 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 31 from Vox, 36 fro AS 2024-08-19 10:21:49,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4423890.0, ans=0.125 2024-08-19 10:21:56,822 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.22 vs. limit=15.0 2024-08-19 10:21:58,500 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12300, loss[loss=0.1161, beats_loss=0.007738, ecapa_loss=0.0001514, whisper_loss=0.1068, over 14633.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001432, whisper_loss=0.08904, over 3841148.55 frames. ], batch size: 56, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:22:04,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4423990.0, ans=0.125 2024-08-19 10:22:05,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.23 vs. limit=15.0 2024-08-19 10:22:08,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4423990.0, ans=0.2 2024-08-19 10:22:11,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4424090.0, ans=0.125 2024-08-19 10:22:21,536 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4424090.0, ans=0.125 2024-08-19 10:22:25,245 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 10:22:38,474 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.350e+01 2.584e+01 3.104e+01 8.227e+01, threshold=5.169e+01, percent-clipped=2.0 2024-08-19 10:23:03,567 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12350, loss[loss=0.1051, beats_loss=0.01094, ecapa_loss=0.0001217, whisper_loss=0.09296, over 18485.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01035, ecapa_loss=0.0001438, whisper_loss=0.08972, over 3828212.12 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:23:17,389 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 24 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 10:23:20,617 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 10:23:37,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4424690.0, ans=0.125 2024-08-19 10:23:43,529 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 10:23:48,422 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 10:23:48,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4424790.0, ans=0.0 2024-08-19 10:23:53,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4424790.0, ans=0.0 2024-08-19 10:24:06,793 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.14 vs. limit=22.5 2024-08-19 10:24:09,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4424890.0, ans=0.125 2024-08-19 10:24:12,975 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12400, loss[loss=0.1226, beats_loss=0.008439, ecapa_loss=0.0001369, whisper_loss=0.1128, over 14757.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01036, ecapa_loss=0.0001428, whisper_loss=0.08993, over 3856654.29 frames. ], batch size: 55, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:24:19,640 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4424990.0, ans=0.1 2024-08-19 10:24:24,308 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2024-08-19 10:24:25,157 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4424990.0, ans=0.0 2024-08-19 10:24:30,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=12.0 2024-08-19 10:24:33,386 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 29 from Vox, 32 fro AS 2024-08-19 10:24:43,300 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 10:24:56,418 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.365e+01 2.557e+01 2.842e+01 4.073e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-19 10:24:56,748 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 24 from Vox, 25 fro AS 2024-08-19 10:25:10,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4425390.0, ans=0.0 2024-08-19 10:25:24,581 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12450, loss[loss=0.1008, beats_loss=0.008532, ecapa_loss=0.0001743, whisper_loss=0.09051, over 17614.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01033, ecapa_loss=0.0001434, whisper_loss=0.08966, over 3859021.45 frames. ], batch size: 72, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:25:24,746 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 22 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 10:25:24,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4425490.0, ans=0.125 2024-08-19 10:25:39,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=4425590.0, ans=0.0 2024-08-19 10:25:42,230 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 10:25:53,380 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 24 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 10:26:01,020 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2024-08-19 10:26:25,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=4425890.0, ans=0.05 2024-08-19 10:26:34,571 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12500, loss[loss=0.09294, beats_loss=0.01287, ecapa_loss=0.0001242, whisper_loss=0.07883, over 22889.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01037, ecapa_loss=0.0001424, whisper_loss=0.09022, over 3936481.38 frames. ], batch size: 93, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:26:44,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4425990.0, ans=0.125 2024-08-19 10:26:55,496 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 10:27:15,119 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 10:27:17,644 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.811e+01 2.209e+01 2.418e+01 2.656e+01 4.014e+01, threshold=4.837e+01, percent-clipped=0.0 2024-08-19 10:27:43,647 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12550, loss[loss=0.0804, beats_loss=0.01013, ecapa_loss=0.0001596, whisper_loss=0.06867, over 20680.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01041, ecapa_loss=0.0001418, whisper_loss=0.08998, over 3913870.72 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:27:43,797 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 21 from LS+wenet, 31 from Vox, 41 fro AS 2024-08-19 10:27:45,112 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 15 from Vox, 35 fro AS 2024-08-19 10:27:47,935 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=22.5 2024-08-19 10:27:52,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4426490.0, ans=0.1 2024-08-19 10:27:52,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4426490.0, ans=0.0 2024-08-19 10:28:22,286 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 20 from Vox, 20 fro AS 2024-08-19 10:28:31,987 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 10:28:43,355 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 25 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 10:28:43,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4426890.0, ans=0.0 2024-08-19 10:28:47,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4426890.0, ans=0.125 2024-08-19 10:28:52,394 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12600, loss[loss=0.09596, beats_loss=0.01037, ecapa_loss=0.0001373, whisper_loss=0.08422, over 18038.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0104, ecapa_loss=0.000142, whisper_loss=0.09072, over 3890843.67 frames. ], batch size: 74, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:28:52,729 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 18 from LS+wenet, 12 from Vox, 30 fro AS 2024-08-19 10:28:54,162 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4426990.0, ans=0.125 2024-08-19 10:28:55,519 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4426990.0, ans=0.0 2024-08-19 10:29:01,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4426990.0, ans=0.125 2024-08-19 10:29:03,798 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4426990.0, ans=0.2 2024-08-19 10:29:06,676 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4427090.0, ans=10.0 2024-08-19 10:29:26,830 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.44 vs. limit=15.0 2024-08-19 10:29:33,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.965e+01 2.274e+01 2.527e+01 2.762e+01 5.696e+01, threshold=5.054e+01, percent-clipped=1.0 2024-08-19 10:29:44,606 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4427390.0, ans=0.1 2024-08-19 10:29:53,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4427390.0, ans=10.0 2024-08-19 10:29:54,577 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 14 from Vox, 46 fro AS 2024-08-19 10:29:57,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4427490.0, ans=0.2 2024-08-19 10:29:58,669 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12650, loss[loss=0.1048, beats_loss=0.01109, ecapa_loss=0.0001545, whisper_loss=0.09216, over 22168.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01055, ecapa_loss=0.000141, whisper_loss=0.08988, over 3894282.49 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:30:44,301 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.47 vs. limit=22.5 2024-08-19 10:30:48,510 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 29 from Vox, 36 fro AS 2024-08-19 10:30:51,137 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 25 from LS+wenet, 17 from Vox, 18 fro AS 2024-08-19 10:30:51,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4427890.0, ans=0.125 2024-08-19 10:30:55,087 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4427890.0, ans=0.0 2024-08-19 10:30:57,241 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 21 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 10:31:01,099 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 10:31:03,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12700, loss[loss=0.101, beats_loss=0.01165, ecapa_loss=0.0001478, whisper_loss=0.08791, over 22131.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01056, ecapa_loss=0.0001416, whisper_loss=0.09, over 3889212.48 frames. ], batch size: 96, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:31:19,634 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 13 from Vox, 21 fro AS 2024-08-19 10:31:28,281 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4428090.0, ans=0.0 2024-08-19 10:31:30,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4428190.0, ans=0.125 2024-08-19 10:31:42,500 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 13 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 10:31:45,157 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.368e+01 2.557e+01 2.871e+01 4.652e+01, threshold=5.113e+01, percent-clipped=0.0 2024-08-19 10:31:50,565 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 10:32:07,615 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.66 vs. limit=15.0 2024-08-19 10:32:10,632 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12750, loss[loss=0.09957, beats_loss=0.01019, ecapa_loss=0.0001211, whisper_loss=0.08816, over 18335.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01048, ecapa_loss=0.0001419, whisper_loss=0.09061, over 3910456.53 frames. ], batch size: 70, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:32:15,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=4428490.0, ans=0.2 2024-08-19 10:32:15,404 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2024-08-19 10:32:18,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4428490.0, ans=0.0 2024-08-19 10:32:22,323 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 25 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 10:32:35,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4428690.0, ans=0.0 2024-08-19 10:32:46,837 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 10 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 10:32:52,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4428790.0, ans=0.125 2024-08-19 10:32:59,536 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.08 vs. limit=22.5 2024-08-19 10:33:01,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4428890.0, ans=0.0 2024-08-19 10:33:15,407 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12800, loss[loss=0.1064, beats_loss=0.01121, ecapa_loss=8.945e-05, whisper_loss=0.09427, over 17826.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01055, ecapa_loss=0.000142, whisper_loss=0.08997, over 3902568.12 frames. ], batch size: 66, lr: 2.03e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 10:33:19,890 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 25 from LS+wenet, 18 from Vox, 27 fro AS 2024-08-19 10:33:25,178 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 34 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 10:33:28,575 WARNING [optim.py:496] (3/4) Scaling gradients by 0.025292346253991127, model_norm_threshold=51.13230514526367 2024-08-19 10:33:28,738 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.encoder.layers.2.norm.log_scale with proportion 0.17, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=7.119e+05, grad_sumsq=7.119e+05, orig_rms_sq=1.000e+00 2024-08-19 10:33:28,912 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 10:33:41,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=4429190.0, ans=0.125 2024-08-19 10:33:51,230 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4429190.0, ans=10.0 2024-08-19 10:33:52,509 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4429190.0, ans=0.125 2024-08-19 10:33:57,296 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.869e+01 2.329e+01 2.594e+01 2.877e+01 2.022e+03, threshold=5.187e+01, percent-clipped=2.0 2024-08-19 10:34:14,141 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4429390.0, ans=0.125 2024-08-19 10:34:21,566 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12850, loss[loss=0.1206, beats_loss=0.008977, ecapa_loss=0.0001729, whisper_loss=0.1099, over 17099.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01064, ecapa_loss=0.0001421, whisper_loss=0.08982, over 3893680.40 frames. ], batch size: 69, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:34:34,500 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4429590.0, ans=0.0 2024-08-19 10:34:35,448 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 10:34:38,846 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=4429590.0, ans=0.2 2024-08-19 10:34:43,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4429590.0, ans=0.0 2024-08-19 10:34:47,549 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 10:34:48,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4429690.0, ans=0.0 2024-08-19 10:34:49,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=4429690.0, ans=0.0 2024-08-19 10:35:02,683 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2024-08-19 10:35:08,776 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4429790.0, ans=0.025 2024-08-19 10:35:09,783 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 16 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 10:35:10,086 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4429790.0, ans=0.125 2024-08-19 10:35:21,307 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 10:35:27,933 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 34 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 10:35:29,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12900, loss[loss=0.1175, beats_loss=0.008482, ecapa_loss=0.0001678, whisper_loss=0.1074, over 22752.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01063, ecapa_loss=0.000142, whisper_loss=0.08962, over 3876552.26 frames. ], batch size: 92, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:35:40,110 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4429990.0, ans=0.125 2024-08-19 10:35:44,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4430090.0, ans=0.0 2024-08-19 10:35:48,887 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4430090.0, ans=0.0 2024-08-19 10:35:55,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4430190.0, ans=0.125 2024-08-19 10:36:11,887 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.655e+01 2.251e+01 2.498e+01 2.810e+01 4.118e+01, threshold=4.997e+01, percent-clipped=0.0 2024-08-19 10:36:14,304 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4430290.0, ans=0.1 2024-08-19 10:36:16,495 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4430290.0, ans=0.015 2024-08-19 10:36:23,926 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.868e-02 2024-08-19 10:36:29,407 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4430390.0, ans=0.2 2024-08-19 10:36:37,359 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 12950, loss[loss=0.09764, beats_loss=0.009051, ecapa_loss=0.0001433, whisper_loss=0.08716, over 15845.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01052, ecapa_loss=0.0001424, whisper_loss=0.0897, over 3908399.89 frames. ], batch size: 62, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:37:07,291 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 10:37:08,529 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 10:37:26,035 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4430790.0, ans=0.125 2024-08-19 10:37:34,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4430890.0, ans=0.125 2024-08-19 10:37:41,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4430890.0, ans=0.125 2024-08-19 10:37:44,819 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 23 from LS+wenet, 25 from Vox, 22 fro AS 2024-08-19 10:37:45,774 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13000, loss[loss=0.09235, beats_loss=0.008482, ecapa_loss=0.0001895, whisper_loss=0.08197, over 16364.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01052, ecapa_loss=0.0001417, whisper_loss=0.09014, over 3937441.28 frames. ], batch size: 70, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:37:46,534 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 18 from Vox, 35 fro AS 2024-08-19 10:37:56,309 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-19 10:38:09,501 INFO [train_multi_KD3.py:844] (3/4) A total of 96 cuts. 32 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 10:38:11,014 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4431090.0, ans=0.125 2024-08-19 10:38:11,126 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4431090.0, ans=0.2 2024-08-19 10:38:12,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4431190.0, ans=0.0 2024-08-19 10:38:14,675 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 16 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 10:38:18,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4431190.0, ans=0.125 2024-08-19 10:38:24,297 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 15 from Vox, 21 fro AS 2024-08-19 10:38:29,947 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.291e+01 2.501e+01 2.762e+01 5.240e+01, threshold=5.001e+01, percent-clipped=1.0 2024-08-19 10:38:47,943 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4431390.0, ans=0.035 2024-08-19 10:38:54,508 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13050, loss[loss=0.1079, beats_loss=0.008169, ecapa_loss=0.0001586, whisper_loss=0.09818, over 20516.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001415, whisper_loss=0.08985, over 3912740.83 frames. ], batch size: 83, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:38:56,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4431490.0, ans=0.1 2024-08-19 10:39:14,446 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-19 10:39:16,872 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 33 from LS+wenet, 15 from Vox, 30 fro AS 2024-08-19 10:39:29,216 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.38 vs. limit=22.5 2024-08-19 10:39:33,188 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4431690.0, ans=0.2 2024-08-19 10:39:33,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2024-08-19 10:39:36,436 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.84 vs. limit=12.0 2024-08-19 10:40:01,708 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.96 vs. limit=10.0 2024-08-19 10:40:06,799 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13100, loss[loss=0.09486, beats_loss=0.009141, ecapa_loss=0.0001704, whisper_loss=0.08401, over 17870.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001396, whisper_loss=0.08964, over 3895826.21 frames. ], batch size: 71, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:40:08,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4431990.0, ans=0.125 2024-08-19 10:40:11,670 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4431990.0, ans=0.025 2024-08-19 10:40:13,126 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2024-08-19 10:40:26,786 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 10:40:27,029 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4432090.0, ans=0.125 2024-08-19 10:40:32,607 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 10:40:33,237 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.74 vs. limit=15.0 2024-08-19 10:40:39,477 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 14 from Vox, 36 fro AS 2024-08-19 10:40:51,946 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.810e+01 2.322e+01 2.530e+01 2.794e+01 4.175e+01, threshold=5.059e+01, percent-clipped=0.0 2024-08-19 10:40:58,556 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 27 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 10:41:15,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4432390.0, ans=0.125 2024-08-19 10:41:17,003 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4432490.0, ans=0.0 2024-08-19 10:41:17,792 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13150, loss[loss=0.1108, beats_loss=0.009902, ecapa_loss=0.000152, whisper_loss=0.09941, over 22153.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001404, whisper_loss=0.09014, over 3886612.89 frames. ], batch size: 88, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:41:23,568 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 18 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 10:41:24,890 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 10:41:27,652 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 33 from LS+wenet, 17 from Vox, 41 fro AS 2024-08-19 10:41:31,292 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=4432590.0, ans=0.2 2024-08-19 10:41:33,889 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 12 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 10:41:51,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4432690.0, ans=0.125 2024-08-19 10:41:55,021 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 10:41:56,597 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4432690.0, ans=0.125 2024-08-19 10:42:05,454 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4432790.0, ans=0.1 2024-08-19 10:42:17,943 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 23 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 10:42:28,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13200, loss[loss=0.08321, beats_loss=0.01334, ecapa_loss=0.000114, whisper_loss=0.06873, over 15563.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01048, ecapa_loss=0.0001394, whisper_loss=0.08998, over 3899867.56 frames. ], batch size: 63, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:42:32,275 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4432990.0, ans=0.125 2024-08-19 10:42:40,400 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 10:42:54,326 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4433090.0, ans=0.125 2024-08-19 10:42:54,624 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 10:42:55,201 WARNING [optim.py:496] (3/4) Scaling gradients by 0.07873938977718353, model_norm_threshold=50.591163635253906 2024-08-19 10:42:55,366 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.14, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.905e+04, grad_sumsq=5.905e+04, orig_rms_sq=1.000e+00 2024-08-19 10:42:56,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4433190.0, ans=0.1 2024-08-19 10:43:05,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4433190.0, ans=0.125 2024-08-19 10:43:13,599 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.802e+01 2.266e+01 2.510e+01 2.814e+01 6.425e+02, threshold=5.020e+01, percent-clipped=2.0 2024-08-19 10:43:39,125 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 28 from Vox, 36 fro AS 2024-08-19 10:43:41,521 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13250, loss[loss=0.09654, beats_loss=0.009644, ecapa_loss=0.0001599, whisper_loss=0.08529, over 16982.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001412, whisper_loss=0.09008, over 3908479.46 frames. ], batch size: 67, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:43:55,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4433590.0, ans=0.125 2024-08-19 10:43:58,204 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4433590.0, ans=0.1 2024-08-19 10:44:06,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4433590.0, ans=0.125 2024-08-19 10:44:36,344 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4433790.0, ans=0.1 2024-08-19 10:44:52,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4433890.0, ans=0.2 2024-08-19 10:44:54,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13300, loss[loss=0.1069, beats_loss=0.009429, ecapa_loss=0.0001478, whisper_loss=0.09597, over 18575.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0104, ecapa_loss=0.0001413, whisper_loss=0.09008, over 3886858.06 frames. ], batch size: 75, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:44:54,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4433990.0, ans=0.0 2024-08-19 10:45:06,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4433990.0, ans=0.0 2024-08-19 10:45:14,386 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-08-19 10:45:21,690 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4434090.0, ans=0.125 2024-08-19 10:45:23,479 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 30 from LS+wenet, 16 from Vox, 34 fro AS 2024-08-19 10:45:38,757 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 22 from Vox, 34 fro AS 2024-08-19 10:45:42,570 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.762e+01 2.304e+01 2.512e+01 2.762e+01 6.116e+01, threshold=5.024e+01, percent-clipped=2.0 2024-08-19 10:45:49,114 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 32 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 10:46:02,512 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 10:46:09,130 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13350, loss[loss=0.0988, beats_loss=0.01211, ecapa_loss=0.0001396, whisper_loss=0.08529, over 21441.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01037, ecapa_loss=0.0001413, whisper_loss=0.09016, over 3889019.03 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:46:23,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4434590.0, ans=0.0 2024-08-19 10:46:30,406 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 10:46:33,376 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4434590.0, ans=0.2 2024-08-19 10:46:34,261 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 10:46:49,031 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 10:47:21,305 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13400, loss[loss=0.1182, beats_loss=0.009641, ecapa_loss=0.0001616, whisper_loss=0.1069, over 22566.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001417, whisper_loss=0.08977, over 3878170.38 frames. ], batch size: 91, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:47:24,961 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 16 from Vox, 32 fro AS 2024-08-19 10:47:28,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4434990.0, ans=0.1 2024-08-19 10:47:42,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4435090.0, ans=0.1 2024-08-19 10:48:03,413 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 24 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 10:48:03,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4435290.0, ans=0.0 2024-08-19 10:48:05,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.898e+01 2.287e+01 2.514e+01 2.765e+01 5.474e+01, threshold=5.028e+01, percent-clipped=1.0 2024-08-19 10:48:11,311 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 36 from LS+wenet, 12 from Vox, 36 fro AS 2024-08-19 10:48:20,041 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 10:48:22,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=4435390.0, ans=0.2 2024-08-19 10:48:23,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4435390.0, ans=0.125 2024-08-19 10:48:27,481 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 10:48:30,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4435490.0, ans=0.0 2024-08-19 10:48:31,444 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13450, loss[loss=0.09809, beats_loss=0.01182, ecapa_loss=0.0001058, whisper_loss=0.08521, over 19793.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01038, ecapa_loss=0.0001419, whisper_loss=0.09052, over 3884595.11 frames. ], batch size: 75, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:48:40,336 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4435490.0, ans=0.2 2024-08-19 10:48:41,286 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 10:48:44,702 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2024-08-19 10:48:48,689 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4435590.0, ans=0.07 2024-08-19 10:48:50,953 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 19 from Vox, 47 fro AS 2024-08-19 10:49:09,514 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 10:49:12,015 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 10:49:29,618 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-19 10:49:31,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4435890.0, ans=0.0 2024-08-19 10:49:32,475 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 15 from LS+wenet, 31 from Vox, 39 fro AS 2024-08-19 10:49:41,201 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13500, loss[loss=0.1053, beats_loss=0.009124, ecapa_loss=0.0001581, whisper_loss=0.09465, over 20186.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001426, whisper_loss=0.0907, over 3908336.72 frames. ], batch size: 82, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:49:59,201 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 10:50:12,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4436190.0, ans=0.125 2024-08-19 10:50:20,142 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4436190.0, ans=0.125 2024-08-19 10:50:23,302 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 10:50:24,354 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.781e+01 2.393e+01 2.618e+01 2.905e+01 4.660e+01, threshold=5.237e+01, percent-clipped=0.0 2024-08-19 10:50:32,876 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.24 vs. limit=22.5 2024-08-19 10:50:39,087 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 34 from LS+wenet, 24 from Vox, 33 fro AS 2024-08-19 10:50:39,710 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4436390.0, ans=0.1 2024-08-19 10:50:46,975 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4436490.0, ans=0.125 2024-08-19 10:50:47,735 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13550, loss[loss=0.09907, beats_loss=0.01295, ecapa_loss=0.0001242, whisper_loss=0.08487, over 21912.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001423, whisper_loss=0.09025, over 3924157.19 frames. ], batch size: 89, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:50:49,336 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 26 from Vox, 40 fro AS 2024-08-19 10:50:50,416 WARNING [optim.py:496] (3/4) Scaling gradients by 0.0421869195997715, model_norm_threshold=52.36887741088867 2024-08-19 10:50:50,581 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.1.encoder.layers.1.norm.log_scale with proportion 0.15, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.302e+05, grad_sumsq=2.302e+05, orig_rms_sq=1.000e+00 2024-08-19 10:50:56,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4436490.0, ans=0.2 2024-08-19 10:51:03,150 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-08-19 10:51:15,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4436690.0, ans=0.0 2024-08-19 10:51:22,445 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4436690.0, ans=0.0 2024-08-19 10:51:25,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4436690.0, ans=0.05 2024-08-19 10:51:51,785 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.85 vs. limit=10.0 2024-08-19 10:51:55,148 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13600, loss[loss=0.1004, beats_loss=0.008987, ecapa_loss=0.0001224, whisper_loss=0.09019, over 15413.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001419, whisper_loss=0.09046, over 3921420.57 frames. ], batch size: 58, lr: 2.03e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:51:56,725 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 30 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 10:52:09,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4437090.0, ans=0.0 2024-08-19 10:52:11,365 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 25 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 10:52:22,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4437190.0, ans=0.1 2024-08-19 10:52:27,228 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-08-19 10:52:32,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4437190.0, ans=0.125 2024-08-19 10:52:39,016 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.971e+01 2.324e+01 2.607e+01 2.904e+01 1.241e+03, threshold=5.213e+01, percent-clipped=1.0 2024-08-19 10:52:55,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.06 vs. limit=22.5 2024-08-19 10:53:02,592 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4437490.0, ans=0.1 2024-08-19 10:53:02,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4437490.0, ans=0.125 2024-08-19 10:53:03,391 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13650, loss[loss=0.1154, beats_loss=0.008855, ecapa_loss=0.0001704, whisper_loss=0.1049, over 22001.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01046, ecapa_loss=0.0001422, whisper_loss=0.09075, over 3935100.72 frames. ], batch size: 91, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:53:05,024 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 10:53:06,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4437490.0, ans=0.0 2024-08-19 10:53:09,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4437490.0, ans=0.0 2024-08-19 10:53:15,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4437490.0, ans=0.125 2024-08-19 10:53:15,996 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 10:53:19,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4437590.0, ans=0.125 2024-08-19 10:53:29,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2024-08-19 10:53:32,363 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.90 vs. limit=6.0 2024-08-19 10:53:53,926 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4437790.0, ans=0.2 2024-08-19 10:53:58,696 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2024-08-19 10:54:14,113 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13700, loss[loss=0.07466, beats_loss=0.01148, ecapa_loss=0.0001273, whisper_loss=0.0619, over 16983.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01054, ecapa_loss=0.0001415, whisper_loss=0.09043, over 3911361.07 frames. ], batch size: 68, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:54:20,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4437990.0, ans=0.0 2024-08-19 10:54:27,913 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4438090.0, ans=0.1 2024-08-19 10:54:31,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4438090.0, ans=0.0 2024-08-19 10:54:35,168 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2024-08-19 10:54:45,781 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 16 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 10:54:48,802 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4438190.0, ans=0.125 2024-08-19 10:54:53,137 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 15 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 10:54:55,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.78 vs. limit=22.5 2024-08-19 10:54:56,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4438190.0, ans=0.0 2024-08-19 10:54:57,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-08-19 10:55:04,002 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.863e+01 2.299e+01 2.529e+01 2.834e+01 4.971e+01, threshold=5.058e+01, percent-clipped=0.0 2024-08-19 10:55:04,823 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 22 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 10:55:09,666 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4438290.0, ans=0.0 2024-08-19 10:55:25,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4438390.0, ans=0.125 2024-08-19 10:55:36,422 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13750, loss[loss=0.09163, beats_loss=0.01338, ecapa_loss=8.384e-05, whisper_loss=0.07741, over 20226.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01053, ecapa_loss=0.0001408, whisper_loss=0.09005, over 3891928.84 frames. ], batch size: 75, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:55:39,406 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 10:55:45,604 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 26 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 10:56:05,632 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4438590.0, ans=0.07 2024-08-19 10:56:43,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4438790.0, ans=0.1 2024-08-19 10:57:13,437 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13800, loss[loss=0.1152, beats_loss=0.008214, ecapa_loss=0.0001456, whisper_loss=0.1055, over 16554.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01055, ecapa_loss=0.0001412, whisper_loss=0.09017, over 3899934.02 frames. ], batch size: 64, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:57:23,470 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 20 from Vox, 36 fro AS 2024-08-19 10:57:43,881 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4439090.0, ans=0.0 2024-08-19 10:57:54,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=4439190.0, ans=15.0 2024-08-19 10:57:55,813 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4439190.0, ans=0.1 2024-08-19 10:57:59,973 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2024-08-19 10:58:10,789 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.867e+01 2.280e+01 2.516e+01 2.824e+01 6.653e+01, threshold=5.033e+01, percent-clipped=2.0 2024-08-19 10:58:27,738 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 29 from Vox, 30 fro AS 2024-08-19 10:58:40,806 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13850, loss[loss=0.1139, beats_loss=0.01034, ecapa_loss=0.0001276, whisper_loss=0.1023, over 19320.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01042, ecapa_loss=0.0001424, whisper_loss=0.09157, over 3931016.90 frames. ], batch size: 75, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 10:58:42,555 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 25 from LS+wenet, 19 from Vox, 29 fro AS 2024-08-19 10:58:43,071 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 10:58:57,384 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.81 vs. limit=22.5 2024-08-19 10:58:58,993 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=8.854e+00 2024-08-19 10:59:15,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=4439690.0, ans=0.125 2024-08-19 11:00:04,820 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 11:00:05,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13900, loss[loss=0.1016, beats_loss=0.01157, ecapa_loss=0.0001511, whisper_loss=0.08856, over 22241.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01045, ecapa_loss=0.0001418, whisper_loss=0.09168, over 3903595.36 frames. ], batch size: 91, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:00:19,892 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4439990.0, ans=0.2 2024-08-19 11:00:50,723 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=4440190.0, ans=0.2 2024-08-19 11:00:52,338 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 30 from Vox, 26 fro AS 2024-08-19 11:00:56,602 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 22 from LS+wenet, 24 from Vox, 23 fro AS 2024-08-19 11:00:59,353 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 11:01:02,399 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4440290.0, ans=0.125 2024-08-19 11:01:03,245 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.879e+01 2.454e+01 2.690e+01 3.083e+01 4.559e+01, threshold=5.380e+01, percent-clipped=0.0 2024-08-19 11:01:05,580 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.68 vs. limit=15.0 2024-08-19 11:01:23,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=4440390.0, ans=0.05 2024-08-19 11:01:25,135 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 19 from Vox, 20 fro AS 2024-08-19 11:01:31,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 13950, loss[loss=0.1185, beats_loss=0.007757, ecapa_loss=0.0001595, whisper_loss=0.1091, over 22400.00 frames. ], tot_loss[loss=0.1041, beats_loss=0.01034, ecapa_loss=0.0001421, whisper_loss=0.09235, over 3880038.51 frames. ], batch size: 90, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:01:40,481 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 20 from LS+wenet, 25 from Vox, 47 fro AS 2024-08-19 11:01:58,791 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4440590.0, ans=0.0 2024-08-19 11:02:02,997 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 21 from Vox, 40 fro AS 2024-08-19 11:02:03,959 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=4440690.0, ans=0.0 2024-08-19 11:02:12,389 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4440690.0, ans=0.125 2024-08-19 11:02:27,952 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4440790.0, ans=0.125 2024-08-19 11:02:33,714 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 27 from LS+wenet, 9 from Vox, 18 fro AS 2024-08-19 11:02:37,462 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.35 vs. limit=22.5 2024-08-19 11:02:44,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4440890.0, ans=0.1 2024-08-19 11:02:45,819 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 24 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 11:02:50,963 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14000, loss[loss=0.1062, beats_loss=0.008505, ecapa_loss=0.000146, whisper_loss=0.09626, over 19945.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001417, whisper_loss=0.09155, over 3870447.70 frames. ], batch size: 79, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:03:19,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.08 vs. limit=15.0 2024-08-19 11:03:24,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4441090.0, ans=0.1 2024-08-19 11:03:49,622 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.993e+01 2.282e+01 2.482e+01 2.793e+01 5.797e+01, threshold=4.965e+01, percent-clipped=1.0 2024-08-19 11:04:25,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14050, loss[loss=0.09442, beats_loss=0.01159, ecapa_loss=8.648e-05, whisper_loss=0.08196, over 14314.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01043, ecapa_loss=0.000141, whisper_loss=0.09144, over 3860106.91 frames. ], batch size: 54, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:04:28,529 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4441490.0, ans=0.1 2024-08-19 11:04:41,766 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 19 from LS+wenet, 15 from Vox, 32 fro AS 2024-08-19 11:05:03,528 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=4441690.0, ans=0.025 2024-08-19 11:05:16,642 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 20 from Vox, 27 fro AS 2024-08-19 11:05:17,091 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4441790.0, ans=0.125 2024-08-19 11:05:20,552 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 29 from LS+wenet, 23 from Vox, 37 fro AS 2024-08-19 11:05:34,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4441890.0, ans=0.05 2024-08-19 11:05:42,316 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 11:05:49,885 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14100, loss[loss=0.1084, beats_loss=0.01143, ecapa_loss=0.0001574, whisper_loss=0.09537, over 17025.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.0001406, whisper_loss=0.0911, over 3845849.99 frames. ], batch size: 70, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:06:08,498 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 11:06:47,658 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.904e+01 2.310e+01 2.577e+01 2.983e+01 3.825e+01, threshold=5.154e+01, percent-clipped=0.0 2024-08-19 11:06:51,170 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.51 vs. limit=10.0 2024-08-19 11:07:13,510 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4442390.0, ans=0.125 2024-08-19 11:07:16,396 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.44 vs. limit=6.0 2024-08-19 11:07:25,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14150, loss[loss=0.08247, beats_loss=0.013, ecapa_loss=0.0001595, whisper_loss=0.06788, over 18127.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001403, whisper_loss=0.09095, over 3875414.20 frames. ], batch size: 79, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:07:25,716 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4442490.0, ans=0.125 2024-08-19 11:07:30,644 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 17 from Vox, 38 fro AS 2024-08-19 11:07:43,645 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=4442590.0, ans=22.5 2024-08-19 11:07:49,583 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 11:07:51,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4442590.0, ans=0.125 2024-08-19 11:08:01,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4442690.0, ans=0.1 2024-08-19 11:08:07,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4442690.0, ans=0.125 2024-08-19 11:08:30,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=12.0 2024-08-19 11:08:49,870 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4442890.0, ans=0.125 2024-08-19 11:08:52,551 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14200, loss[loss=0.09213, beats_loss=0.01309, ecapa_loss=0.0001294, whisper_loss=0.07775, over 17622.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01046, ecapa_loss=0.0001403, whisper_loss=0.09039, over 3891255.24 frames. ], batch size: 73, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:09:05,089 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 14 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 11:09:09,348 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4443090.0, ans=0.09899494936611666 2024-08-19 11:09:20,771 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 11 from LS+wenet, 16 from Vox, 35 fro AS 2024-08-19 11:09:31,787 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4443190.0, ans=0.125 2024-08-19 11:09:32,563 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.16 vs. limit=15.0 2024-08-19 11:09:36,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4443190.0, ans=0.0 2024-08-19 11:09:42,452 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4443190.0, ans=0.125 2024-08-19 11:09:49,521 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4443290.0, ans=0.1 2024-08-19 11:09:50,400 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.637e+01 2.398e+01 2.635e+01 3.009e+01 4.372e+01, threshold=5.270e+01, percent-clipped=0.0 2024-08-19 11:09:51,205 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4443290.0, ans=0.2 2024-08-19 11:10:27,901 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14250, loss[loss=0.1076, beats_loss=0.01005, ecapa_loss=0.0001588, whisper_loss=0.09598, over 20731.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01041, ecapa_loss=0.0001398, whisper_loss=0.09052, over 3868453.46 frames. ], batch size: 87, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:10:56,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4443590.0, ans=0.2 2024-08-19 11:11:15,504 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 11:11:20,361 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4443690.0, ans=0.0 2024-08-19 11:11:21,913 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 22 from Vox, 25 fro AS 2024-08-19 11:11:23,955 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 29 from LS+wenet, 23 from Vox, 32 fro AS 2024-08-19 11:11:36,287 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4443790.0, ans=0.125 2024-08-19 11:11:45,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4443890.0, ans=0.125 2024-08-19 11:11:50,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4443890.0, ans=0.125 2024-08-19 11:11:58,573 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14300, loss[loss=0.09529, beats_loss=0.01206, ecapa_loss=0.000132, whisper_loss=0.08191, over 15351.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.0001397, whisper_loss=0.09055, over 3890492.08 frames. ], batch size: 61, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:12:14,210 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 27 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 11:12:22,803 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 11:12:30,748 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4444090.0, ans=0.1 2024-08-19 11:12:35,041 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4444190.0, ans=0.1 2024-08-19 11:12:49,999 WARNING [optim.py:496] (3/4) Scaling gradients by 0.05855522304773331, model_norm_threshold=52.698848724365234 2024-08-19 11:12:50,162 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.0.layers.1.norm.log_scale with proportion 0.25, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.023e+05, grad_sumsq=2.023e+05, orig_rms_sq=1.000e+00 2024-08-19 11:12:56,610 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.784e+01 2.271e+01 2.635e+01 2.896e+01 9.000e+02, threshold=5.270e+01, percent-clipped=1.0 2024-08-19 11:13:06,473 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2024-08-19 11:13:11,506 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 27 from LS+wenet, 21 from Vox, 27 fro AS 2024-08-19 11:13:11,808 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4444390.0, ans=0.0 2024-08-19 11:13:14,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4444390.0, ans=0.0 2024-08-19 11:13:23,588 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 11:13:29,807 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14350, loss[loss=0.08463, beats_loss=0.01053, ecapa_loss=0.000163, whisper_loss=0.07247, over 18974.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001423, whisper_loss=0.09051, over 3883534.07 frames. ], batch size: 81, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:13:33,063 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 31 from LS+wenet, 18 from Vox, 38 fro AS 2024-08-19 11:13:33,606 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.35 vs. limit=15.0 2024-08-19 11:13:42,196 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 25 from Vox, 27 fro AS 2024-08-19 11:13:46,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4444590.0, ans=0.0 2024-08-19 11:13:51,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4444590.0, ans=0.0 2024-08-19 11:13:53,861 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=15.0 2024-08-19 11:13:56,108 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=4444590.0, ans=15.0 2024-08-19 11:14:15,139 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 12 from Vox, 28 fro AS 2024-08-19 11:14:35,682 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:15:04,210 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14400, loss[loss=0.1274, beats_loss=0.009041, ecapa_loss=0.0001271, whisper_loss=0.1171, over 23008.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01035, ecapa_loss=0.0001427, whisper_loss=0.09088, over 3909620.23 frames. ], batch size: 87, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:15:19,299 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 12 from LS+wenet, 22 from Vox, 19 fro AS 2024-08-19 11:15:54,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4445190.0, ans=0.125 2024-08-19 11:15:55,312 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 11:16:01,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.952e+01 2.334e+01 2.545e+01 2.906e+01 1.418e+02, threshold=5.090e+01, percent-clipped=1.0 2024-08-19 11:16:35,729 INFO [train_multi_KD3.py:1116] (3/4) Epoch 30, batch 14450, loss[loss=0.1025, beats_loss=0.008892, ecapa_loss=0.0001698, whisper_loss=0.09193, over 13714.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01046, ecapa_loss=0.0001426, whisper_loss=0.09018, over 3897821.25 frames. ], batch size: 55, lr: 2.02e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:16:38,273 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-08-19 11:16:40,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4445490.0, ans=0.1 2024-08-19 11:16:42,057 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 34 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 11:16:52,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4445590.0, ans=0.125 2024-08-19 11:17:21,125 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:18:30,849 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 0, loss[loss=0.09276, beats_loss=0.008195, ecapa_loss=0.0001958, whisper_loss=0.08261, over 21523.00 frames. ], tot_loss[loss=0.09276, beats_loss=0.008195, ecapa_loss=0.0001958, whisper_loss=0.08261, over 21523.00 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:18:30,849 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 11:19:12,002 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on ASR_libri: loss=0.2529, beats_loss=0, ecapa_loss=0.0005129, whisper_loss=0.2478, over 922467.00 frames. 2024-08-19 11:19:31,578 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003975, beats_loss=0, ecapa_loss=0.0003975, whisper_loss=0, over 939242.00 frames. 2024-08-19 11:20:31,768 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.7280, 2.2285, 2.4733, 2.3635], device='cuda:3') 2024-08-19 11:20:58,127 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on AT_audioset: loss=0.02297, beats_loss=0.02297, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 11:20:58,130 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 11:21:44,957 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 24 from LS+wenet, 26 from Vox, 41 fro AS 2024-08-19 11:23:29,003 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 21 from LS+wenet, 13 from Vox, 23 fro AS 2024-08-19 11:24:18,263 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.992e+01 2.403e+01 2.773e+01 3.093e+01 8.282e+01, threshold=5.547e+01, percent-clipped=1.0 2024-08-19 11:24:48,295 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 11:24:48,512 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4446290.0, ans=0.125 2024-08-19 11:24:55,633 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 50, loss[loss=0.1134, beats_loss=0.008443, ecapa_loss=0.0001413, whisper_loss=0.1035, over 18493.00 frames. ], tot_loss[loss=0.09822, beats_loss=0.009471, ecapa_loss=0.0001452, whisper_loss=0.08729, over 917155.76 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:25:05,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4446390.0, ans=0.125 2024-08-19 11:25:15,095 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 11:25:30,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4446390.0, ans=0.125 2024-08-19 11:25:43,868 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4446490.0, ans=0.0 2024-08-19 11:25:58,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4446490.0, ans=0.125 2024-08-19 11:26:21,394 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 18 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 11:26:36,600 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 11:28:32,922 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4446890.0, ans=0.1 2024-08-19 11:28:34,196 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 100, loss[loss=0.114, beats_loss=0.007985, ecapa_loss=0.0001449, whisper_loss=0.1045, over 23744.00 frames. ], tot_loss[loss=0.09863, beats_loss=0.009259, ecapa_loss=0.0001452, whisper_loss=0.08792, over 1572824.92 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:28:36,463 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=22.5 2024-08-19 11:28:38,362 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.90 vs. limit=22.5 2024-08-19 11:28:46,187 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4446890.0, ans=0.125 2024-08-19 11:29:42,223 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4447090.0, ans=0.04949747468305833 2024-08-19 11:30:02,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4447190.0, ans=0.125 2024-08-19 11:30:04,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4447190.0, ans=0.125 2024-08-19 11:30:21,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.140e+01 2.553e+01 2.874e+01 3.336e+01 1.667e+02, threshold=5.748e+01, percent-clipped=2.0 2024-08-19 11:30:39,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 150, loss[loss=0.09446, beats_loss=0.01127, ecapa_loss=0.0001658, whisper_loss=0.08153, over 21062.00 frames. ], tot_loss[loss=0.09996, beats_loss=0.009308, ecapa_loss=0.0001454, whisper_loss=0.0892, over 2074334.16 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:30:48,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4447390.0, ans=0.0 2024-08-19 11:30:57,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4447390.0, ans=0.0 2024-08-19 11:31:10,053 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.98 vs. limit=22.5 2024-08-19 11:31:38,044 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 11:31:48,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4447690.0, ans=0.125 2024-08-19 11:31:56,526 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=12.0 2024-08-19 11:32:00,461 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 20 from Vox, 28 fro AS 2024-08-19 11:32:14,175 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 11:32:17,627 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 200, loss[loss=0.09958, beats_loss=0.01176, ecapa_loss=0.0001686, whisper_loss=0.08613, over 19826.00 frames. ], tot_loss[loss=0.101, beats_loss=0.009439, ecapa_loss=0.0001474, whisper_loss=0.0901, over 2451929.28 frames. ], batch size: 84, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:32:25,201 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4447890.0, ans=0.125 2024-08-19 11:32:36,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4447990.0, ans=0.125 2024-08-19 11:32:38,866 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 11:32:40,466 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4447990.0, ans=0.125 2024-08-19 11:32:40,524 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4447990.0, ans=0.04949747468305833 2024-08-19 11:32:53,705 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4448090.0, ans=0.0 2024-08-19 11:33:02,067 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2024-08-19 11:33:05,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4448090.0, ans=0.125 2024-08-19 11:33:13,960 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4448190.0, ans=0.0 2024-08-19 11:33:19,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2024-08-19 11:33:20,408 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 16 from Vox, 48 fro AS 2024-08-19 11:33:20,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4448190.0, ans=0.04949747468305833 2024-08-19 11:33:22,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4448190.0, ans=0.125 2024-08-19 11:33:26,150 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4448190.0, ans=0.125 2024-08-19 11:33:27,432 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 11:33:32,443 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.882e+01 2.387e+01 2.633e+01 3.004e+01 1.700e+02, threshold=5.266e+01, percent-clipped=1.0 2024-08-19 11:33:34,258 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 11 from Vox, 32 fro AS 2024-08-19 11:33:38,886 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4448290.0, ans=0.125 2024-08-19 11:33:38,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4448290.0, ans=0.1 2024-08-19 11:33:48,410 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 250, loss[loss=0.0776, beats_loss=0.009017, ecapa_loss=0.0001735, whisper_loss=0.06685, over 13375.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.00977, ecapa_loss=0.000147, whisper_loss=0.08895, over 2746731.14 frames. ], batch size: 54, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:33:57,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4448390.0, ans=0.0 2024-08-19 11:33:58,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4448390.0, ans=0.1 2024-08-19 11:34:47,476 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 11:34:50,140 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=15.0 2024-08-19 11:34:51,400 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4448690.0, ans=0.125 2024-08-19 11:34:57,853 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 26 from Vox, 25 fro AS 2024-08-19 11:35:12,333 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2024-08-19 11:35:16,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4448890.0, ans=0.0 2024-08-19 11:35:17,460 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 300, loss[loss=0.07166, beats_loss=0.01206, ecapa_loss=0.0001103, whisper_loss=0.0585, over 15905.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.009922, ecapa_loss=0.0001458, whisper_loss=0.08912, over 2983733.49 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 11:35:47,815 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 26 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 11:35:54,693 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4449090.0, ans=0.0 2024-08-19 11:35:59,057 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4449090.0, ans=0.2 2024-08-19 11:36:03,296 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4449090.0, ans=0.0 2024-08-19 11:36:08,807 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 30 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 11:36:13,546 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4449190.0, ans=0.0 2024-08-19 11:36:18,209 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4449190.0, ans=0.0 2024-08-19 11:36:21,001 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 11:36:27,379 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4449190.0, ans=0.1 2024-08-19 11:36:29,492 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 26 from LS+wenet, 27 from Vox, 31 fro AS 2024-08-19 11:36:32,050 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.740e+01 2.162e+01 2.418e+01 2.637e+01 1.048e+02, threshold=4.837e+01, percent-clipped=1.0 2024-08-19 11:36:45,212 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 350, loss[loss=0.1135, beats_loss=0.01002, ecapa_loss=0.0001504, whisper_loss=0.102, over 15139.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01002, ecapa_loss=0.0001438, whisper_loss=0.08947, over 3171424.96 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:36:53,869 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 17 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 11:37:16,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4449490.0, ans=0.125 2024-08-19 11:37:21,827 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=4449590.0, ans=6.0 2024-08-19 11:37:37,035 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 11:38:01,349 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2024-08-19 11:38:15,409 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 400, loss[loss=0.09981, beats_loss=0.01248, ecapa_loss=9.861e-05, whisper_loss=0.08635, over 19439.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01014, ecapa_loss=0.0001421, whisper_loss=0.08923, over 3307516.28 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:38:25,230 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2024-08-19 11:38:26,723 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.197e+01 2024-08-19 11:38:29,170 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 35 from LS+wenet, 13 from Vox, 36 fro AS 2024-08-19 11:38:31,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4449990.0, ans=10.0 2024-08-19 11:39:12,752 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=4450190.0, ans=0.02 2024-08-19 11:39:14,005 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 11:39:16,154 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 20 from Vox, 26 fro AS 2024-08-19 11:39:28,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4450290.0, ans=0.125 2024-08-19 11:39:32,656 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.597e+01 2.340e+01 2.668e+01 2.900e+01 1.466e+02, threshold=5.335e+01, percent-clipped=2.0 2024-08-19 11:39:35,347 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4450290.0, ans=0.0 2024-08-19 11:39:36,992 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4450290.0, ans=0.0 2024-08-19 11:39:46,329 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 450, loss[loss=0.07573, beats_loss=0.01138, ecapa_loss=0.0001262, whisper_loss=0.06309, over 18041.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01016, ecapa_loss=0.0001426, whisper_loss=0.0895, over 3439818.98 frames. ], batch size: 74, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:39:48,028 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 27 from Vox, 27 fro AS 2024-08-19 11:39:54,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4450390.0, ans=0.125 2024-08-19 11:39:57,569 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:40:14,025 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 34 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 11:40:25,249 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 36 from LS+wenet, 18 from Vox, 34 fro AS 2024-08-19 11:40:29,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4450590.0, ans=0.1 2024-08-19 11:40:59,012 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 25 from Vox, 26 fro AS 2024-08-19 11:41:01,759 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4450790.0, ans=0.1 2024-08-19 11:41:07,090 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 11:41:14,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 500, loss[loss=0.1103, beats_loss=0.01031, ecapa_loss=0.0001421, whisper_loss=0.09861, over 15838.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01017, ecapa_loss=0.000142, whisper_loss=0.08998, over 3536163.11 frames. ], batch size: 60, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:41:18,608 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 14 from Vox, 23 fro AS 2024-08-19 11:41:30,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=12.0 2024-08-19 11:41:35,396 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4450990.0, ans=0.125 2024-08-19 11:41:43,882 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 11:41:48,891 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 34 from LS+wenet, 20 from Vox, 33 fro AS 2024-08-19 11:41:54,302 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4451090.0, ans=0.125 2024-08-19 11:41:59,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4451090.0, ans=0.0 2024-08-19 11:42:19,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4451190.0, ans=0.125 2024-08-19 11:42:29,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4451290.0, ans=0.1 2024-08-19 11:42:31,923 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.755e+01 2.295e+01 2.524e+01 2.813e+01 3.428e+01, threshold=5.047e+01, percent-clipped=0.0 2024-08-19 11:42:37,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4451290.0, ans=0.0 2024-08-19 11:42:41,524 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 12 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 11:42:42,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4451290.0, ans=0.2 2024-08-19 11:42:44,368 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4451390.0, ans=0.2 2024-08-19 11:42:45,230 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 550, loss[loss=0.1178, beats_loss=0.009723, ecapa_loss=0.0001412, whisper_loss=0.1067, over 21924.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01019, ecapa_loss=0.0001424, whisper_loss=0.08919, over 3608648.12 frames. ], batch size: 85, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:42:45,362 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 11:42:50,263 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4451390.0, ans=0.0 2024-08-19 11:42:55,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4451390.0, ans=0.0 2024-08-19 11:42:57,046 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4451390.0, ans=0.125 2024-08-19 11:43:12,884 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 11:43:19,613 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.92 vs. limit=10.0 2024-08-19 11:43:20,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4451590.0, ans=0.125 2024-08-19 11:43:24,134 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 11:43:30,756 WARNING [optim.py:496] (3/4) Scaling gradients by 0.03087170422077179, model_norm_threshold=50.47096252441406 2024-08-19 11:43:30,919 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=3.400e+05, grad_sumsq=1.036e+05, orig_rms_sq=3.283e+00 2024-08-19 11:43:32,939 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4451590.0, ans=0.07 2024-08-19 11:43:40,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4451690.0, ans=0.125 2024-08-19 11:43:43,890 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 11:43:54,567 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4451790.0, ans=0.125 2024-08-19 11:43:59,931 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=12.0 2024-08-19 11:44:06,910 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 25 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 11:44:15,153 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 600, loss[loss=0.1132, beats_loss=0.008237, ecapa_loss=0.0001903, whisper_loss=0.103, over 18889.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01022, ecapa_loss=0.0001412, whisper_loss=0.08957, over 3684830.91 frames. ], batch size: 79, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:44:47,185 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 11:44:49,815 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4452090.0, ans=0.0 2024-08-19 11:44:58,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4452090.0, ans=0.125 2024-08-19 11:45:03,263 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 11:45:03,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4452190.0, ans=0.0 2024-08-19 11:45:03,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.63 vs. limit=22.5 2024-08-19 11:45:22,049 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4452290.0, ans=0.125 2024-08-19 11:45:23,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4452290.0, ans=0.125 2024-08-19 11:45:26,798 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.947e+01 2.353e+01 2.580e+01 2.899e+01 1.635e+03, threshold=5.160e+01, percent-clipped=1.0 2024-08-19 11:45:33,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4452290.0, ans=0.1 2024-08-19 11:45:39,598 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=10.0 2024-08-19 11:45:40,112 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 650, loss[loss=0.1064, beats_loss=0.01036, ecapa_loss=0.0001319, whisper_loss=0.09472, over 20272.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01022, ecapa_loss=0.0001412, whisper_loss=0.08982, over 3693875.28 frames. ], batch size: 77, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:45:43,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=7.46 vs. limit=12.0 2024-08-19 11:46:01,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4452490.0, ans=0.1 2024-08-19 11:46:14,148 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 11:46:34,790 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-08-19 11:46:47,719 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 11:47:02,540 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 18 from Vox, 21 fro AS 2024-08-19 11:47:08,481 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 700, loss[loss=0.09547, beats_loss=0.01003, ecapa_loss=0.0001752, whisper_loss=0.08369, over 15706.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01027, ecapa_loss=0.0001409, whisper_loss=0.08964, over 3697419.37 frames. ], batch size: 66, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:47:13,105 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4452890.0, ans=0.0 2024-08-19 11:47:13,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4452890.0, ans=0.2 2024-08-19 11:47:16,899 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 15 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-19 11:47:37,274 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.14 vs. limit=22.5 2024-08-19 11:47:49,831 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4453090.0, ans=0.2 2024-08-19 11:47:54,643 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4453090.0, ans=0.125 2024-08-19 11:48:05,721 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4453190.0, ans=0.07 2024-08-19 11:48:08,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2024-08-19 11:48:14,374 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4453190.0, ans=0.125 2024-08-19 11:48:20,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4453190.0, ans=0.1 2024-08-19 11:48:28,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.237e+01 2.442e+01 2.764e+01 3.734e+01, threshold=4.884e+01, percent-clipped=0.0 2024-08-19 11:48:40,060 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:48:42,446 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 750, loss[loss=0.1046, beats_loss=0.01132, ecapa_loss=0.0001238, whisper_loss=0.09203, over 20298.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01032, ecapa_loss=0.0001407, whisper_loss=0.08933, over 3729273.36 frames. ], batch size: 76, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:48:44,297 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4453390.0, ans=0.125 2024-08-19 11:49:06,568 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.52 vs. limit=15.0 2024-08-19 11:49:23,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=4453590.0, ans=0.125 2024-08-19 11:50:10,545 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 800, loss[loss=0.07853, beats_loss=0.01405, ecapa_loss=0.0001229, whisper_loss=0.06325, over 22337.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01032, ecapa_loss=0.0001396, whisper_loss=0.08919, over 3751450.03 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:50:11,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4453890.0, ans=0.125 2024-08-19 11:50:14,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4453890.0, ans=0.125 2024-08-19 11:50:29,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4453990.0, ans=0.2 2024-08-19 11:50:31,822 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4453990.0, ans=0.1 2024-08-19 11:50:39,145 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 26 from LS+wenet, 13 from Vox, 35 fro AS 2024-08-19 11:50:48,413 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4454090.0, ans=0.125 2024-08-19 11:50:53,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4454090.0, ans=0.2 2024-08-19 11:51:01,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4454090.0, ans=0.2 2024-08-19 11:51:03,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4454090.0, ans=0.125 2024-08-19 11:51:28,463 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.853e+01 2.213e+01 2.446e+01 2.635e+01 4.034e+01, threshold=4.891e+01, percent-clipped=0.0 2024-08-19 11:51:37,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4454290.0, ans=0.2 2024-08-19 11:51:45,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 850, loss[loss=0.1134, beats_loss=0.009507, ecapa_loss=0.0001371, whisper_loss=0.1026, over 18295.00 frames. ], tot_loss[loss=0.1005, beats_loss=0.01034, ecapa_loss=0.0001401, whisper_loss=0.08875, over 3766962.55 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:51:47,543 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=4454390.0, ans=0.125 2024-08-19 11:51:49,564 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4454390.0, ans=0.125 2024-08-19 11:52:42,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=5.78 vs. limit=12.0 2024-08-19 11:52:53,559 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4454690.0, ans=0.1 2024-08-19 11:53:03,875 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=22.5 2024-08-19 11:53:04,873 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=4454790.0, ans=0.2 2024-08-19 11:53:06,651 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4454790.0, ans=0.125 2024-08-19 11:53:11,261 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 11:53:15,943 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 900, loss[loss=0.105, beats_loss=0.009183, ecapa_loss=0.0001543, whisper_loss=0.09422, over 15463.00 frames. ], tot_loss[loss=0.09966, beats_loss=0.01035, ecapa_loss=0.0001399, whisper_loss=0.08791, over 3776169.74 frames. ], batch size: 57, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:53:28,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4454890.0, ans=0.0 2024-08-19 11:53:35,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4454990.0, ans=0.125 2024-08-19 11:53:41,437 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 19 from LS+wenet, 11 from Vox, 26 fro AS 2024-08-19 11:53:53,172 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4455090.0, ans=0.1 2024-08-19 11:53:54,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4455090.0, ans=0.0 2024-08-19 11:54:05,439 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2024-08-19 11:54:06,900 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4455090.0, ans=0.125 2024-08-19 11:54:11,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4455190.0, ans=0.125 2024-08-19 11:54:19,604 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=4455190.0, ans=0.2 2024-08-19 11:54:33,153 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.411e+01 2.627e+01 3.077e+01 2.229e+02, threshold=5.255e+01, percent-clipped=1.0 2024-08-19 11:54:44,026 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4455290.0, ans=0.125 2024-08-19 11:54:49,772 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 950, loss[loss=0.09973, beats_loss=0.01137, ecapa_loss=0.0001515, whisper_loss=0.08685, over 22031.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01029, ecapa_loss=0.0001403, whisper_loss=0.0885, over 3783592.23 frames. ], batch size: 90, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:54:53,851 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 14 from Vox, 38 fro AS 2024-08-19 11:54:57,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=15.0 2024-08-19 11:55:09,718 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=4455490.0, ans=0.95 2024-08-19 11:55:14,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.83 vs. limit=15.0 2024-08-19 11:55:28,013 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 11:55:42,883 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4455690.0, ans=0.1 2024-08-19 11:56:06,755 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 33 from Vox, 27 fro AS 2024-08-19 11:56:16,790 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 19 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 11:56:17,841 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4455790.0, ans=0.125 2024-08-19 11:56:22,902 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1000, loss[loss=0.09693, beats_loss=0.01212, ecapa_loss=0.0001498, whisper_loss=0.08331, over 18975.00 frames. ], tot_loss[loss=0.09971, beats_loss=0.01034, ecapa_loss=0.000139, whisper_loss=0.08799, over 3781435.65 frames. ], batch size: 76, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:56:23,996 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4455890.0, ans=0.125 2024-08-19 11:56:28,752 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 14 from Vox, 35 fro AS 2024-08-19 11:56:31,183 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4455890.0, ans=0.0 2024-08-19 11:56:34,799 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 11:56:39,206 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:56:46,500 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-08-19 11:56:57,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4455990.0, ans=0.0 2024-08-19 11:56:59,499 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4455990.0, ans=0.125 2024-08-19 11:57:02,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4456090.0, ans=0.1 2024-08-19 11:57:09,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=15.0 2024-08-19 11:57:09,631 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 11:57:11,739 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 11:57:13,546 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 11:57:17,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4456190.0, ans=0.125 2024-08-19 11:57:40,064 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.835e+01 2.280e+01 2.488e+01 2.674e+01 4.382e+01, threshold=4.976e+01, percent-clipped=0.0 2024-08-19 11:57:43,276 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=4456290.0, ans=0.2 2024-08-19 11:57:47,517 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4456290.0, ans=0.125 2024-08-19 11:57:57,856 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1050, loss[loss=0.1015, beats_loss=0.009393, ecapa_loss=0.000146, whisper_loss=0.09065, over 16906.00 frames. ], tot_loss[loss=0.09917, beats_loss=0.01037, ecapa_loss=0.0001387, whisper_loss=0.08741, over 3755944.74 frames. ], batch size: 67, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:57:59,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4456390.0, ans=0.125 2024-08-19 11:58:09,738 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 22 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 11:58:25,246 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4456490.0, ans=0.0 2024-08-19 11:58:28,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4456490.0, ans=0.0 2024-08-19 11:58:34,184 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4456590.0, ans=0.125 2024-08-19 11:58:53,930 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 24 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 11:59:02,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4456690.0, ans=0.0 2024-08-19 11:59:17,927 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4456790.0, ans=0.2 2024-08-19 11:59:17,990 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 11:59:28,713 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 22 from LS+wenet, 20 from Vox, 17 fro AS 2024-08-19 11:59:29,760 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1100, loss[loss=0.1147, beats_loss=0.006946, ecapa_loss=0.0001487, whisper_loss=0.1063, over 15259.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01026, ecapa_loss=0.0001392, whisper_loss=0.08933, over 3788783.91 frames. ], batch size: 59, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 11:59:44,417 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 11:59:46,382 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 12:00:14,780 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 17 from Vox, 44 fro AS 2024-08-19 12:00:44,490 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=4457190.0, ans=0.2 2024-08-19 12:00:58,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4457290.0, ans=0.05 2024-08-19 12:00:59,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.924e+01 2.392e+01 2.653e+01 2.944e+01 8.213e+01, threshold=5.307e+01, percent-clipped=1.0 2024-08-19 12:00:59,754 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4457290.0, ans=0.125 2024-08-19 12:01:10,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1150, loss[loss=0.06516, beats_loss=0.01357, ecapa_loss=0.0001366, whisper_loss=0.05022, over 21431.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01022, ecapa_loss=0.0001384, whisper_loss=0.09003, over 3806217.15 frames. ], batch size: 91, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:01:41,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4457490.0, ans=0.125 2024-08-19 12:01:56,750 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4457590.0, ans=0.0 2024-08-19 12:01:58,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=4457590.0, ans=15.0 2024-08-19 12:02:01,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4457590.0, ans=0.125 2024-08-19 12:02:04,512 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 28 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 12:02:15,944 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.14 vs. limit=15.0 2024-08-19 12:02:32,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4457790.0, ans=10.0 2024-08-19 12:02:43,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1200, loss[loss=0.09979, beats_loss=0.01074, ecapa_loss=0.0001398, whisper_loss=0.08765, over 23048.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01023, ecapa_loss=0.0001397, whisper_loss=0.08923, over 3805290.81 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:02:47,260 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 16 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 12:02:47,695 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4457890.0, ans=0.1 2024-08-19 12:03:38,885 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4458190.0, ans=0.125 2024-08-19 12:03:44,378 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 23 from Vox, 33 fro AS 2024-08-19 12:03:49,504 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 27 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 12:03:56,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.81 vs. limit=12.0 2024-08-19 12:04:04,454 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.704e+01 2.388e+01 2.700e+01 3.012e+01 6.792e+01, threshold=5.400e+01, percent-clipped=1.0 2024-08-19 12:04:10,452 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 20 from Vox, 38 fro AS 2024-08-19 12:04:18,793 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1250, loss[loss=0.1172, beats_loss=0.009075, ecapa_loss=0.00019, whisper_loss=0.1063, over 15405.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01037, ecapa_loss=0.0001393, whisper_loss=0.08903, over 3850811.17 frames. ], batch size: 63, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:04:36,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4458390.0, ans=0.125 2024-08-19 12:04:44,872 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 14 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 12:04:45,166 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4458490.0, ans=0.2 2024-08-19 12:04:46,907 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 16 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 12:04:49,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4458490.0, ans=0.1 2024-08-19 12:05:19,034 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 20 from LS+wenet, 14 from Vox, 33 fro AS 2024-08-19 12:05:53,331 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 23 from LS+wenet, 17 from Vox, 31 fro AS 2024-08-19 12:05:56,302 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1300, loss[loss=0.1081, beats_loss=0.01051, ecapa_loss=0.0001746, whisper_loss=0.0958, over 17391.00 frames. ], tot_loss[loss=0.1006, beats_loss=0.01041, ecapa_loss=0.0001393, whisper_loss=0.08879, over 3835814.25 frames. ], batch size: 71, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:06:01,364 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4458890.0, ans=0.1 2024-08-19 12:06:01,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=4458890.0, ans=10.0 2024-08-19 12:06:04,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4458890.0, ans=0.125 2024-08-19 12:06:08,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4458890.0, ans=0.125 2024-08-19 12:06:16,372 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4458990.0, ans=0.0 2024-08-19 12:06:26,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4458990.0, ans=0.125 2024-08-19 12:06:40,043 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4459090.0, ans=0.1 2024-08-19 12:06:46,557 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=4459090.0, ans=0.2 2024-08-19 12:07:02,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=4459190.0, ans=0.2 2024-08-19 12:07:03,617 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 12:07:03,867 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4459190.0, ans=0.125 2024-08-19 12:07:17,541 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.691e+01 2.249e+01 2.463e+01 2.703e+01 4.319e+01, threshold=4.927e+01, percent-clipped=0.0 2024-08-19 12:07:20,088 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 12:07:32,289 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1350, loss[loss=0.1063, beats_loss=0.013, ecapa_loss=0.0001165, whisper_loss=0.09213, over 21496.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01042, ecapa_loss=0.0001382, whisper_loss=0.08908, over 3844768.86 frames. ], batch size: 85, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:07:57,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4459490.0, ans=0.2 2024-08-19 12:08:31,686 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 12:08:42,674 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 17 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 12:08:58,291 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 19 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 12:08:58,527 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4459790.0, ans=0.0 2024-08-19 12:09:04,121 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1400, loss[loss=0.1112, beats_loss=0.0118, ecapa_loss=0.0001541, whisper_loss=0.09789, over 17875.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01044, ecapa_loss=0.0001374, whisper_loss=0.08913, over 3849729.44 frames. ], batch size: 74, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:09:04,969 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4459890.0, ans=0.125 2024-08-19 12:09:07,683 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 28 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 12:09:10,590 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 12:09:22,378 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 36 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 12:09:25,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4459990.0, ans=0.1 2024-08-19 12:09:30,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4459990.0, ans=0.0 2024-08-19 12:10:02,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4460190.0, ans=0.0 2024-08-19 12:10:11,814 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.831e+01 2.232e+01 2.433e+01 2.753e+01 5.485e+01, threshold=4.866e+01, percent-clipped=1.0 2024-08-19 12:10:14,763 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 12:10:26,660 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1450, loss[loss=0.07501, beats_loss=0.01146, ecapa_loss=0.000113, whisper_loss=0.06241, over 16852.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01032, ecapa_loss=0.0001382, whisper_loss=0.08932, over 3820980.18 frames. ], batch size: 67, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:10:52,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4460390.0, ans=0.125 2024-08-19 12:11:22,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4460490.0, ans=0.125 2024-08-19 12:11:23,093 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 22 from Vox, 43 fro AS 2024-08-19 12:11:23,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4460490.0, ans=0.07 2024-08-19 12:11:35,323 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4460590.0, ans=0.125 2024-08-19 12:11:41,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4460590.0, ans=0.0 2024-08-19 12:11:41,503 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4460590.0, ans=0.125 2024-08-19 12:11:49,000 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4460690.0, ans=0.0 2024-08-19 12:11:49,281 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2024-08-19 12:11:56,870 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 12:11:57,654 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.690e+05 2024-08-19 12:12:24,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4460790.0, ans=0.0 2024-08-19 12:12:26,465 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1500, loss[loss=0.08901, beats_loss=0.01204, ecapa_loss=0.000155, whisper_loss=0.07542, over 17589.00 frames. ], tot_loss[loss=0.1002, beats_loss=0.01039, ecapa_loss=0.0001377, whisper_loss=0.08842, over 3802543.55 frames. ], batch size: 75, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:12:36,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4460890.0, ans=0.2 2024-08-19 12:12:48,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4460990.0, ans=0.125 2024-08-19 12:12:49,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4460990.0, ans=0.2 2024-08-19 12:12:57,830 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 12:13:19,131 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 12:13:19,612 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4461190.0, ans=0.0 2024-08-19 12:13:41,565 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.893e+01 2.290e+01 2.508e+01 2.829e+01 3.950e+01, threshold=5.015e+01, percent-clipped=0.0 2024-08-19 12:13:56,723 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1550, loss[loss=0.09317, beats_loss=0.0106, ecapa_loss=0.0001521, whisper_loss=0.08105, over 17708.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01028, ecapa_loss=0.0001386, whisper_loss=0.08917, over 3826020.78 frames. ], batch size: 73, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:13:56,902 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 23 from LS+wenet, 25 from Vox, 43 fro AS 2024-08-19 12:13:59,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4461390.0, ans=0.5 2024-08-19 12:14:12,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4461390.0, ans=0.125 2024-08-19 12:14:23,985 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4461490.0, ans=0.125 2024-08-19 12:14:27,812 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4461490.0, ans=0.125 2024-08-19 12:14:59,749 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 12:15:10,689 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 23 from Vox, 43 fro AS 2024-08-19 12:15:17,810 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4461790.0, ans=0.125 2024-08-19 12:15:29,688 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1600, loss[loss=0.09517, beats_loss=0.01334, ecapa_loss=0.0001314, whisper_loss=0.08052, over 22185.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01028, ecapa_loss=0.0001382, whisper_loss=0.08912, over 3821297.01 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:15:38,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4461890.0, ans=0.125 2024-08-19 12:15:38,952 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.17 vs. limit=15.0 2024-08-19 12:15:47,762 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=4461990.0, ans=0.09899494936611666 2024-08-19 12:16:16,405 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 12:16:21,312 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 28 from Vox, 24 fro AS 2024-08-19 12:16:34,647 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-08-19 12:16:45,903 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.703e+01 2.361e+01 2.575e+01 2.927e+01 3.831e+01, threshold=5.149e+01, percent-clipped=0.0 2024-08-19 12:16:52,574 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4462290.0, ans=0.125 2024-08-19 12:16:57,821 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1650, loss[loss=0.1318, beats_loss=0.007017, ecapa_loss=0.0001448, whisper_loss=0.1234, over 19324.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01023, ecapa_loss=0.0001377, whisper_loss=0.09031, over 3830965.60 frames. ], batch size: 72, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:17:10,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4462390.0, ans=0.125 2024-08-19 12:17:27,005 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4462490.0, ans=0.0 2024-08-19 12:17:59,477 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 12:18:20,049 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2024-08-19 12:18:24,289 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=9.01 vs. limit=12.0 2024-08-19 12:18:29,768 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:18:35,151 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1700, loss[loss=0.0981, beats_loss=0.006684, ecapa_loss=0.0001579, whisper_loss=0.08984, over 14666.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01021, ecapa_loss=0.0001369, whisper_loss=0.09081, over 3830623.56 frames. ], batch size: 56, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:18:45,911 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 12:18:47,062 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4462890.0, ans=0.0 2024-08-19 12:18:52,156 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4462890.0, ans=0.0 2024-08-19 12:18:54,981 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 21 from Vox, 46 fro AS 2024-08-19 12:19:25,145 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=12.0 2024-08-19 12:19:31,218 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4463190.0, ans=0.125 2024-08-19 12:19:36,316 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.57 vs. limit=15.0 2024-08-19 12:19:43,453 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=5.605e+01 2024-08-19 12:19:44,234 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.732e+01 2.356e+01 2.588e+01 2.791e+01 7.851e+01, threshold=5.177e+01, percent-clipped=4.0 2024-08-19 12:19:49,916 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4463290.0, ans=0.2 2024-08-19 12:19:59,276 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 12:20:00,327 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4463390.0, ans=0.2 2024-08-19 12:20:00,996 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1750, loss[loss=0.1049, beats_loss=0.01126, ecapa_loss=0.0001306, whisper_loss=0.09238, over 23483.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01024, ecapa_loss=0.0001371, whisper_loss=0.0904, over 3840624.94 frames. ], batch size: 93, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:20:10,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4463390.0, ans=0.125 2024-08-19 12:20:49,286 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 40 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 12:20:52,374 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.80 vs. limit=22.5 2024-08-19 12:21:16,727 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4463690.0, ans=0.2 2024-08-19 12:21:18,753 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=4463690.0, ans=0.125 2024-08-19 12:21:45,351 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1800, loss[loss=0.1041, beats_loss=0.008551, ecapa_loss=0.0001561, whisper_loss=0.09398, over 23168.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01024, ecapa_loss=0.0001375, whisper_loss=0.09036, over 3833773.94 frames. ], batch size: 92, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:21:47,127 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4463890.0, ans=0.05 2024-08-19 12:21:48,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4463890.0, ans=0.125 2024-08-19 12:21:58,244 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 12:22:09,885 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 12:22:10,216 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=4.592e+00 2024-08-19 12:22:14,981 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=4463990.0, ans=0.02 2024-08-19 12:22:20,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4463990.0, ans=0.125 2024-08-19 12:22:23,133 INFO [train_multi_KD3.py:844] (3/4) A total of 53 cuts. 15 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 12:22:36,581 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4464090.0, ans=0.125 2024-08-19 12:22:41,503 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 28 from LS+wenet, 7 from Vox, 27 fro AS 2024-08-19 12:22:46,653 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 12:22:57,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4464190.0, ans=0.0 2024-08-19 12:23:03,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4464190.0, ans=0.0 2024-08-19 12:23:21,352 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.692e+01 2.194e+01 2.455e+01 2.727e+01 5.354e+01, threshold=4.909e+01, percent-clipped=1.0 2024-08-19 12:23:42,026 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1850, loss[loss=0.07628, beats_loss=0.01112, ecapa_loss=0.0001834, whisper_loss=0.06333, over 15923.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01018, ecapa_loss=0.0001383, whisper_loss=0.09054, over 3825986.39 frames. ], batch size: 69, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:23:49,394 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 25 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 12:24:07,487 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4464390.0, ans=0.125 2024-08-19 12:24:37,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4464590.0, ans=0.125 2024-08-19 12:24:45,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.04 vs. limit=5.0 2024-08-19 12:25:14,550 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 21 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 12:25:15,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4464690.0, ans=0.0 2024-08-19 12:25:18,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=22.5 2024-08-19 12:25:30,731 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4464790.0, ans=0.125 2024-08-19 12:25:44,913 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1900, loss[loss=0.1128, beats_loss=0.009827, ecapa_loss=0.0001696, whisper_loss=0.1012, over 21529.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01026, ecapa_loss=0.0001377, whisper_loss=0.08997, over 3825584.75 frames. ], batch size: 88, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:25:50,848 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2024-08-19 12:25:56,107 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.53 vs. limit=15.0 2024-08-19 12:26:08,303 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4464890.0, ans=0.0 2024-08-19 12:26:17,050 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 10 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 12:26:42,211 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.22 vs. limit=15.0 2024-08-19 12:26:49,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4465090.0, ans=0.04949747468305833 2024-08-19 12:26:52,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4465090.0, ans=0.0 2024-08-19 12:26:56,110 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2024-08-19 12:27:33,014 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.271e+01 2.463e+01 2.812e+01 5.945e+01, threshold=4.926e+01, percent-clipped=2.0 2024-08-19 12:27:47,048 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-08-19 12:27:55,047 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 1950, loss[loss=0.08556, beats_loss=0.01171, ecapa_loss=0.0001434, whisper_loss=0.07241, over 14355.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01027, ecapa_loss=0.0001382, whisper_loss=0.0895, over 3796157.76 frames. ], batch size: 58, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:28:27,771 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 30 from Vox, 31 fro AS 2024-08-19 12:28:40,244 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4465490.0, ans=0.1 2024-08-19 12:28:47,368 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 12:28:48,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4465590.0, ans=0.1 2024-08-19 12:29:19,919 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.84 vs. limit=12.0 2024-08-19 12:29:25,928 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:29:43,405 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2000, loss[loss=0.109, beats_loss=0.008717, ecapa_loss=0.000144, whisper_loss=0.0988, over 17035.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01029, ecapa_loss=0.0001363, whisper_loss=0.08964, over 3800169.94 frames. ], batch size: 65, lr: 1.99e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:29:44,022 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 23 from LS+wenet, 23 from Vox, 29 fro AS 2024-08-19 12:29:57,678 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 30 from Vox, 29 fro AS 2024-08-19 12:30:11,805 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 22 from Vox, 21 fro AS 2024-08-19 12:30:14,129 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4465990.0, ans=0.125 2024-08-19 12:30:27,221 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4466090.0, ans=0.125 2024-08-19 12:30:43,107 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 31 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 12:31:01,173 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.903e+01 2.320e+01 2.587e+01 3.185e+01 3.788e+02, threshold=5.175e+01, percent-clipped=4.0 2024-08-19 12:31:14,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2050, loss[loss=0.07034, beats_loss=0.01094, ecapa_loss=0.0001492, whisper_loss=0.05791, over 14010.00 frames. ], tot_loss[loss=0.101, beats_loss=0.0103, ecapa_loss=0.0001362, whisper_loss=0.08936, over 3793702.94 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:31:25,446 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=4466390.0, ans=0.05 2024-08-19 12:31:36,802 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 12:31:40,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4466490.0, ans=0.0 2024-08-19 12:31:44,439 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 12:31:53,387 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 22 from LS+wenet, 14 from Vox, 29 fro AS 2024-08-19 12:32:12,023 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 38 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 12:32:16,665 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4466690.0, ans=0.0 2024-08-19 12:32:38,423 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=15.0 2024-08-19 12:32:48,698 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2100, loss[loss=0.09222, beats_loss=0.01201, ecapa_loss=0.0001607, whisper_loss=0.0786, over 17500.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01041, ecapa_loss=0.0001344, whisper_loss=0.0894, over 3815730.14 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:33:20,184 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 18 from LS+wenet, 20 from Vox, 40 fro AS 2024-08-19 12:33:26,109 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2024-08-19 12:33:34,053 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4467090.0, ans=0.125 2024-08-19 12:33:47,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4467190.0, ans=0.0 2024-08-19 12:33:54,064 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4467190.0, ans=0.125 2024-08-19 12:34:11,019 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4467290.0, ans=0.2 2024-08-19 12:34:11,800 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.656e+01 2.305e+01 2.500e+01 2.818e+01 4.309e+01, threshold=5.000e+01, percent-clipped=0.0 2024-08-19 12:34:13,251 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:34:23,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4467290.0, ans=0.2 2024-08-19 12:34:24,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4467290.0, ans=0.125 2024-08-19 12:34:29,976 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2150, loss[loss=0.08862, beats_loss=0.01222, ecapa_loss=0.0001606, whisper_loss=0.07479, over 20864.00 frames. ], tot_loss[loss=0.1007, beats_loss=0.01054, ecapa_loss=0.0001335, whisper_loss=0.08883, over 3814259.89 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:34:52,879 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 16 from LS+wenet, 27 from Vox, 25 fro AS 2024-08-19 12:34:59,776 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 28 from Vox, 31 fro AS 2024-08-19 12:35:10,615 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4467590.0, ans=0.125 2024-08-19 12:35:15,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4467590.0, ans=0.125 2024-08-19 12:35:26,011 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 12:35:29,681 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4467690.0, ans=0.125 2024-08-19 12:35:31,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4467690.0, ans=0.125 2024-08-19 12:35:31,321 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4467690.0, ans=0.2 2024-08-19 12:35:43,601 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 29 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 12:35:46,100 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4467790.0, ans=0.125 2024-08-19 12:35:48,928 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 18 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 12:35:53,958 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4467790.0, ans=0.125 2024-08-19 12:35:54,900 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 19 from LS+wenet, 21 from Vox, 30 fro AS 2024-08-19 12:35:59,472 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 12:36:01,844 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2200, loss[loss=0.07848, beats_loss=0.01175, ecapa_loss=0.0001318, whisper_loss=0.06541, over 14914.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01058, ecapa_loss=0.0001347, whisper_loss=0.08883, over 3830853.00 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:36:08,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4467890.0, ans=0.125 2024-08-19 12:36:29,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.01 vs. limit=15.0 2024-08-19 12:36:34,580 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 34 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 12:36:40,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=4468090.0, ans=12.0 2024-08-19 12:36:45,825 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4468090.0, ans=0.0 2024-08-19 12:37:09,083 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4468190.0, ans=0.1 2024-08-19 12:37:19,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.73 vs. limit=10.0 2024-08-19 12:37:21,684 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.887e+01 2.215e+01 2.622e+01 2.935e+01 2.277e+02, threshold=5.244e+01, percent-clipped=1.0 2024-08-19 12:37:27,282 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=4468290.0, ans=10.0 2024-08-19 12:37:31,368 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=3.87 vs. limit=15.0 2024-08-19 12:37:38,370 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2250, loss[loss=0.1226, beats_loss=0.009005, ecapa_loss=0.0001368, whisper_loss=0.1122, over 23114.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.000135, whisper_loss=0.08948, over 3852696.85 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:37:45,885 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.67 vs. limit=22.5 2024-08-19 12:37:47,774 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4468390.0, ans=0.2 2024-08-19 12:38:14,457 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 39 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 12:38:22,193 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=15.0 2024-08-19 12:38:31,279 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 12:38:33,576 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4468590.0, ans=0.125 2024-08-19 12:38:35,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4468590.0, ans=0.125 2024-08-19 12:38:37,908 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-19 12:39:00,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=4468690.0, ans=0.5 2024-08-19 12:39:01,984 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 17 from LS+wenet, 19 from Vox, 31 fro AS 2024-08-19 12:39:24,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2300, loss[loss=0.1018, beats_loss=0.009316, ecapa_loss=0.0001816, whisper_loss=0.09066, over 20893.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01054, ecapa_loss=0.0001367, whisper_loss=0.09024, over 3911550.70 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 12:39:25,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4468890.0, ans=0.125 2024-08-19 12:39:41,648 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 10 from Vox, 35 fro AS 2024-08-19 12:39:47,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=4468990.0, ans=0.125 2024-08-19 12:40:18,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4469190.0, ans=0.0 2024-08-19 12:40:24,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4469190.0, ans=0.0 2024-08-19 12:40:40,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.821e+01 2.339e+01 2.578e+01 2.838e+01 4.503e+01, threshold=5.155e+01, percent-clipped=1.0 2024-08-19 12:40:48,185 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.799e+05 2024-08-19 12:40:54,649 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2350, loss[loss=0.1176, beats_loss=0.008886, ecapa_loss=0.0001685, whisper_loss=0.107, over 22228.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01045, ecapa_loss=0.0001379, whisper_loss=0.09059, over 3875577.67 frames. ], batch size: 93, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:41:33,478 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 21 from LS+wenet, 12 from Vox, 22 fro AS 2024-08-19 12:41:40,786 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 24 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 12:42:05,537 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4469690.0, ans=0.125 2024-08-19 12:42:09,568 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 19 from LS+wenet, 25 from Vox, 37 fro AS 2024-08-19 12:42:18,976 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.80 vs. limit=22.5 2024-08-19 12:42:26,134 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4469790.0, ans=0.0 2024-08-19 12:42:27,568 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 23 from LS+wenet, 17 from Vox, 37 fro AS 2024-08-19 12:42:28,525 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2400, loss[loss=0.1025, beats_loss=0.01138, ecapa_loss=0.0001063, whisper_loss=0.0901, over 21028.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01044, ecapa_loss=0.0001379, whisper_loss=0.09078, over 3887046.32 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:42:38,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4469890.0, ans=0.0 2024-08-19 12:42:48,011 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4469990.0, ans=0.125 2024-08-19 12:42:48,079 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4469990.0, ans=0.1 2024-08-19 12:42:53,538 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-19 12:43:10,363 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=4470090.0, ans=0.05 2024-08-19 12:43:22,020 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4470090.0, ans=0.0 2024-08-19 12:43:24,229 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4470190.0, ans=0.0 2024-08-19 12:43:33,265 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4470190.0, ans=0.035 2024-08-19 12:43:45,931 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.014e+01 2.427e+01 2.678e+01 2.913e+01 4.859e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-19 12:43:46,459 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 12:43:57,766 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2450, loss[loss=0.1275, beats_loss=0.008819, ecapa_loss=0.0001283, whisper_loss=0.1174, over 13999.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01039, ecapa_loss=0.0001389, whisper_loss=0.09115, over 3884037.91 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:44:01,678 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-08-19 12:44:02,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4470390.0, ans=0.1 2024-08-19 12:44:04,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4470390.0, ans=0.0 2024-08-19 12:44:06,095 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 29 from Vox, 31 fro AS 2024-08-19 12:44:09,649 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4470390.0, ans=0.0 2024-08-19 12:44:17,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4470490.0, ans=0.125 2024-08-19 12:44:23,307 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4470490.0, ans=10.0 2024-08-19 12:44:27,786 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 24 from LS+wenet, 16 from Vox, 25 fro AS 2024-08-19 12:44:38,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4470590.0, ans=0.0 2024-08-19 12:44:43,309 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4470590.0, ans=0.125 2024-08-19 12:44:59,070 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=9.393e-02 2024-08-19 12:45:02,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=4470690.0, ans=0.2 2024-08-19 12:45:23,726 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 15 from Vox, 23 fro AS 2024-08-19 12:45:25,558 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2500, loss[loss=0.1051, beats_loss=0.00867, ecapa_loss=0.0001405, whisper_loss=0.09503, over 15186.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01038, ecapa_loss=0.0001392, whisper_loss=0.09073, over 3843543.26 frames. ], batch size: 57, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:45:34,956 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.00 vs. limit=22.5 2024-08-19 12:45:37,415 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 20 from Vox, 43 fro AS 2024-08-19 12:46:02,135 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4471090.0, ans=0.125 2024-08-19 12:46:09,318 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 20 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 12:46:12,218 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 17 from LS+wenet, 15 from Vox, 24 fro AS 2024-08-19 12:46:16,345 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4471090.0, ans=0.125 2024-08-19 12:46:25,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4471190.0, ans=0.125 2024-08-19 12:46:44,875 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.775e+01 2.332e+01 2.578e+01 2.868e+01 6.708e+01, threshold=5.156e+01, percent-clipped=1.0 2024-08-19 12:46:50,155 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2024-08-19 12:46:59,035 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2550, loss[loss=0.07992, beats_loss=0.01239, ecapa_loss=0.000111, whisper_loss=0.06642, over 17768.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01039, ecapa_loss=0.0001382, whisper_loss=0.09035, over 3879876.71 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:47:06,760 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 26 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 12:47:21,383 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 26 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 12:47:56,026 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 23 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 12:48:06,466 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 38 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 12:48:14,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4471790.0, ans=0.125 2024-08-19 12:48:21,619 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.41 vs. limit=15.0 2024-08-19 12:48:23,854 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2600, loss[loss=0.08596, beats_loss=0.0122, ecapa_loss=0.0001088, whisper_loss=0.07267, over 16607.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01032, ecapa_loss=0.0001384, whisper_loss=0.09129, over 3877643.29 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:48:26,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4471890.0, ans=0.0 2024-08-19 12:48:37,509 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=4.42 vs. limit=12.0 2024-08-19 12:48:51,305 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4471990.0, ans=0.125 2024-08-19 12:48:59,039 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 12:49:23,088 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 27 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 12:49:36,711 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4472290.0, ans=0.125 2024-08-19 12:49:38,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4472290.0, ans=0.125 2024-08-19 12:49:39,402 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.768e+01 2.420e+01 2.627e+01 2.917e+01 8.623e+01, threshold=5.255e+01, percent-clipped=2.0 2024-08-19 12:49:47,460 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-08-19 12:49:54,075 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2650, loss[loss=0.1174, beats_loss=0.01039, ecapa_loss=0.0001129, whisper_loss=0.1059, over 18803.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001383, whisper_loss=0.09014, over 3917469.98 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:49:58,424 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4472390.0, ans=0.0 2024-08-19 12:50:09,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4472390.0, ans=0.035 2024-08-19 12:50:16,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4472490.0, ans=0.0 2024-08-19 12:50:45,391 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 21 from Vox, 21 fro AS 2024-08-19 12:51:06,995 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4472790.0, ans=0.0 2024-08-19 12:51:23,211 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2700, loss[loss=0.1163, beats_loss=0.00659, ecapa_loss=0.0001366, whisper_loss=0.1083, over 15881.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001377, whisper_loss=0.08968, over 3899295.82 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:51:26,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4472890.0, ans=0.07 2024-08-19 12:51:30,321 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-08-19 12:51:58,188 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 20 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 12:52:05,084 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 14 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 12:52:33,357 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 21 from LS+wenet, 22 from Vox, 33 fro AS 2024-08-19 12:52:33,664 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4473290.0, ans=0.125 2024-08-19 12:52:34,862 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.658e+01 2.264e+01 2.469e+01 2.717e+01 4.475e+01, threshold=4.939e+01, percent-clipped=0.0 2024-08-19 12:52:47,135 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 27 from LS+wenet, 20 from Vox, 25 fro AS 2024-08-19 12:52:48,106 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2750, loss[loss=0.113, beats_loss=0.008139, ecapa_loss=0.000153, whisper_loss=0.1034, over 18357.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001382, whisper_loss=0.08936, over 3889253.00 frames. ], batch size: 72, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:52:53,923 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 13 from Vox, 30 fro AS 2024-08-19 12:53:06,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2024-08-19 12:53:11,684 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.60 vs. limit=15.0 2024-08-19 12:53:12,961 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4473490.0, ans=0.125 2024-08-19 12:53:16,864 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.15 vs. limit=15.0 2024-08-19 12:53:27,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4473590.0, ans=0.125 2024-08-19 12:53:39,724 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 26 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 12:53:56,337 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4473690.0, ans=0.0 2024-08-19 12:54:03,299 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 29 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 12:54:17,673 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2800, loss[loss=0.08282, beats_loss=0.01331, ecapa_loss=0.0001145, whisper_loss=0.06837, over 14265.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01038, ecapa_loss=0.0001389, whisper_loss=0.08992, over 3875364.76 frames. ], batch size: 59, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:54:18,361 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.57 vs. limit=10.0 2024-08-19 12:54:29,004 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.195e+05 2024-08-19 12:54:31,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4473890.0, ans=0.1 2024-08-19 12:54:38,095 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4473990.0, ans=0.1 2024-08-19 12:54:42,385 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.85 vs. limit=6.0 2024-08-19 12:54:49,864 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 12:55:14,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4474190.0, ans=0.025 2024-08-19 12:55:26,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2024-08-19 12:55:33,023 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 32 from LS+wenet, 24 from Vox, 37 fro AS 2024-08-19 12:55:33,886 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.412e+01 2.633e+01 2.866e+01 1.596e+02, threshold=5.265e+01, percent-clipped=3.0 2024-08-19 12:55:50,095 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2850, loss[loss=0.1124, beats_loss=0.01019, ecapa_loss=0.0001186, whisper_loss=0.101, over 19272.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01039, ecapa_loss=0.0001386, whisper_loss=0.0902, over 3871451.28 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:55:54,068 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 12:55:56,021 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4474390.0, ans=0.1 2024-08-19 12:56:05,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4474490.0, ans=0.125 2024-08-19 12:56:06,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4474490.0, ans=0.125 2024-08-19 12:56:15,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4474490.0, ans=0.1 2024-08-19 12:56:48,250 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=4474690.0, ans=0.2 2024-08-19 12:56:51,871 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-08-19 12:57:01,248 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 12:57:14,193 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2900, loss[loss=0.1161, beats_loss=0.007749, ecapa_loss=0.0001621, whisper_loss=0.1068, over 15889.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.0001395, whisper_loss=0.09061, over 3885585.56 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:57:44,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4474990.0, ans=0.0 2024-08-19 12:57:54,996 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2024-08-19 12:58:18,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4475190.0, ans=0.125 2024-08-19 12:58:27,187 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 34 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 12:58:29,415 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4475290.0, ans=0.125 2024-08-19 12:58:30,156 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.344e+01 2.665e+01 2.999e+01 6.021e+01, threshold=5.330e+01, percent-clipped=1.0 2024-08-19 12:58:44,964 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 2950, loss[loss=0.1064, beats_loss=0.009274, ecapa_loss=0.000142, whisper_loss=0.0957, over 19923.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01041, ecapa_loss=0.0001402, whisper_loss=0.09005, over 3877615.03 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 12:59:09,618 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.25 vs. limit=15.0 2024-08-19 13:00:05,527 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-08-19 13:00:13,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3000, loss[loss=0.1168, beats_loss=0.007978, ecapa_loss=0.0001521, whisper_loss=0.1073, over 19983.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01034, ecapa_loss=0.0001407, whisper_loss=0.09094, over 3875511.29 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:00:13,957 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 13:00:59,287 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on ASR_libri: loss=0.2548, beats_loss=0, ecapa_loss=0.0005195, whisper_loss=0.2496, over 922467.00 frames. 2024-08-19 13:01:17,423 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003921, beats_loss=0, ecapa_loss=0.0003921, whisper_loss=0, over 939242.00 frames. 2024-08-19 13:02:05,292 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.2.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.2756, 2.1030, 2.2463, 2.0792], device='cuda:3') 2024-08-19 13:03:07,083 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on AT_audioset: loss=0.02304, beats_loss=0.02304, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 13:03:07,087 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 13:03:17,523 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 21 from LS+wenet, 22 from Vox, 45 fro AS 2024-08-19 13:03:21,687 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 30 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 13:03:26,330 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 13:03:33,568 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 16 from Vox, 40 fro AS 2024-08-19 13:03:50,993 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 13:03:51,290 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4476090.0, ans=0.1 2024-08-19 13:04:06,591 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4476190.0, ans=0.0 2024-08-19 13:04:06,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4476190.0, ans=0.0 2024-08-19 13:04:06,668 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4476190.0, ans=0.125 2024-08-19 13:04:16,073 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.953e+01 2.421e+01 2.691e+01 3.119e+01 4.810e+01, threshold=5.383e+01, percent-clipped=0.0 2024-08-19 13:04:16,373 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 15 from Vox, 36 fro AS 2024-08-19 13:04:30,497 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3050, loss[loss=0.09612, beats_loss=0.01066, ecapa_loss=0.0001457, whisper_loss=0.084, over 22600.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01032, ecapa_loss=0.0001417, whisper_loss=0.09086, over 3883433.44 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:04:36,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4476390.0, ans=0.1 2024-08-19 13:04:37,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4476390.0, ans=0.0 2024-08-19 13:04:56,886 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.20 vs. limit=6.0 2024-08-19 13:05:39,251 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4476690.0, ans=0.125 2024-08-19 13:05:58,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4476790.0, ans=0.1 2024-08-19 13:06:00,816 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3100, loss[loss=0.08359, beats_loss=0.01236, ecapa_loss=0.0001399, whisper_loss=0.06984, over 22190.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01039, ecapa_loss=0.0001406, whisper_loss=0.09071, over 3880497.29 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:06:10,763 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 13:06:18,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4476990.0, ans=0.1 2024-08-19 13:06:31,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=4476990.0, ans=0.2 2024-08-19 13:06:47,128 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 25 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 13:06:47,804 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4477090.0, ans=0.0 2024-08-19 13:07:14,613 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.808e+01 2.300e+01 2.518e+01 2.909e+01 4.343e+01, threshold=5.036e+01, percent-clipped=0.0 2024-08-19 13:07:27,346 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3150, loss[loss=0.1091, beats_loss=0.009301, ecapa_loss=0.0001418, whisper_loss=0.09842, over 22533.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001406, whisper_loss=0.09135, over 3893250.09 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:07:44,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.57 vs. limit=15.0 2024-08-19 13:07:47,450 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 22 from Vox, 30 fro AS 2024-08-19 13:07:48,779 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 13:07:50,417 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 13:08:09,855 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4477590.0, ans=0.125 2024-08-19 13:08:25,460 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4477690.0, ans=0.1 2024-08-19 13:08:28,854 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4477690.0, ans=0.125 2024-08-19 13:08:30,003 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 32 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 13:08:37,097 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.361e+05 2024-08-19 13:08:50,884 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3200, loss[loss=0.1064, beats_loss=0.0125, ecapa_loss=0.0001011, whisper_loss=0.09289, over 16718.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01041, ecapa_loss=0.0001404, whisper_loss=0.0911, over 3878326.60 frames. ], batch size: 63, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:09:22,724 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.16 vs. limit=22.5 2024-08-19 13:09:41,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4478090.0, ans=0.125 2024-08-19 13:09:47,340 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4478190.0, ans=0.035 2024-08-19 13:09:50,390 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4478190.0, ans=0.125 2024-08-19 13:09:54,118 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=4478190.0, ans=0.2 2024-08-19 13:09:57,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4478190.0, ans=0.125 2024-08-19 13:10:03,345 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 24 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 13:10:05,687 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.832e+01 2.281e+01 2.603e+01 2.850e+01 4.454e+01, threshold=5.206e+01, percent-clipped=0.0 2024-08-19 13:10:19,922 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3250, loss[loss=0.09533, beats_loss=0.00818, ecapa_loss=0.0001476, whisper_loss=0.08567, over 15082.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01036, ecapa_loss=0.0001412, whisper_loss=0.09105, over 3870670.97 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 5.764607523034235e+17 2024-08-19 13:10:24,540 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-08-19 13:10:48,545 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=4478490.0, ans=0.125 2024-08-19 13:11:05,295 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4478590.0, ans=0.0 2024-08-19 13:11:23,936 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-08-19 13:11:27,284 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 13:11:30,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=4478790.0, ans=0.05 2024-08-19 13:11:33,238 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2024-08-19 13:11:35,139 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.49 vs. limit=10.0 2024-08-19 13:11:42,616 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3300, loss[loss=0.1172, beats_loss=0.008564, ecapa_loss=0.0001469, whisper_loss=0.1072, over 22852.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.0001418, whisper_loss=0.09079, over 3917178.95 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:11:45,960 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 20 from LS+wenet, 19 from Vox, 44 fro AS 2024-08-19 13:11:51,068 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4478890.0, ans=0.0 2024-08-19 13:12:00,402 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4478990.0, ans=0.0 2024-08-19 13:12:20,880 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09731415659189224, model_norm_threshold=52.06379318237305 2024-08-19 13:12:21,048 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.62, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.783e+05, grad_sumsq=1.704e+07, orig_rms_sq=1.046e-02 2024-08-19 13:12:35,989 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 13:12:42,464 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4479190.0, ans=0.025 2024-08-19 13:12:45,366 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4479290.0, ans=0.0 2024-08-19 13:12:50,636 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.743e+01 2.318e+01 2.677e+01 3.024e+01 5.350e+02, threshold=5.354e+01, percent-clipped=4.0 2024-08-19 13:12:55,337 WARNING [optim.py:496] (3/4) Scaling gradients by 0.09374216943979263, model_norm_threshold=53.53926467895508 2024-08-19 13:12:55,517 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder.encoders.2.out_combiner.bypass_scale with proportion 0.08, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=2.490e+04, grad_sumsq=4.301e+04, orig_rms_sq=5.788e-01 2024-08-19 13:12:58,724 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 32 from LS+wenet, 16 from Vox, 33 fro AS 2024-08-19 13:13:01,549 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3350, loss[loss=0.1121, beats_loss=0.009686, ecapa_loss=0.0001214, whisper_loss=0.1012, over 23930.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01034, ecapa_loss=0.0001422, whisper_loss=0.09168, over 3928789.27 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:13:13,566 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4479390.0, ans=0.125 2024-08-19 13:13:19,180 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4479490.0, ans=0.125 2024-08-19 13:13:19,197 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4479490.0, ans=0.125 2024-08-19 13:13:33,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4479590.0, ans=0.1 2024-08-19 13:13:40,001 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=4479590.0, ans=0.125 2024-08-19 13:13:43,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4479590.0, ans=0.09899494936611666 2024-08-19 13:13:45,767 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 20 from Vox, 24 fro AS 2024-08-19 13:13:53,743 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=4479690.0, ans=0.0 2024-08-19 13:13:58,120 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4479690.0, ans=0.125 2024-08-19 13:14:16,254 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3400, loss[loss=0.1005, beats_loss=0.01292, ecapa_loss=0.0001275, whisper_loss=0.08635, over 20008.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001432, whisper_loss=0.09097, over 3938417.29 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:14:19,582 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 27 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 13:14:30,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=4479990.0, ans=0.2 2024-08-19 13:14:38,672 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.94 vs. limit=15.0 2024-08-19 13:14:41,380 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4479990.0, ans=0.125 2024-08-19 13:15:10,726 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4480190.0, ans=0.0 2024-08-19 13:15:10,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=4480190.0, ans=0.05 2024-08-19 13:15:16,711 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.065e-02 2024-08-19 13:15:22,753 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.735e+01 2.253e+01 2.575e+01 3.055e+01 5.711e+02, threshold=5.150e+01, percent-clipped=3.0 2024-08-19 13:15:33,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3450, loss[loss=0.1203, beats_loss=0.008977, ecapa_loss=0.000152, whisper_loss=0.1098, over 19925.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01048, ecapa_loss=0.0001433, whisper_loss=0.09066, over 3935316.90 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:15:34,480 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2024-08-19 13:15:51,434 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4480490.0, ans=0.1 2024-08-19 13:15:55,970 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4480490.0, ans=0.125 2024-08-19 13:16:02,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4480490.0, ans=0.125 2024-08-19 13:16:08,657 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 13:16:21,462 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4480690.0, ans=0.0 2024-08-19 13:16:24,092 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 19 from Vox, 40 fro AS 2024-08-19 13:16:35,275 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2024-08-19 13:16:37,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=4480790.0, ans=0.0 2024-08-19 13:16:45,878 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2024-08-19 13:16:50,980 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3500, loss[loss=0.1028, beats_loss=0.01169, ecapa_loss=0.0001221, whisper_loss=0.08984, over 23276.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01046, ecapa_loss=0.000143, whisper_loss=0.09107, over 3942630.03 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:16:54,291 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 31 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 13:16:55,845 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 35 from LS+wenet, 15 from Vox, 40 fro AS 2024-08-19 13:16:58,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4480890.0, ans=0.1 2024-08-19 13:17:06,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4480990.0, ans=0.2 2024-08-19 13:17:09,398 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 25 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 13:17:12,383 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 29 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 13:17:20,324 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 19 from LS+wenet, 30 from Vox, 45 fro AS 2024-08-19 13:17:48,722 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4481190.0, ans=0.1 2024-08-19 13:17:54,814 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4481290.0, ans=0.125 2024-08-19 13:17:56,268 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4481290.0, ans=0.125 2024-08-19 13:17:57,048 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.595e+01 2.291e+01 2.477e+01 2.753e+01 3.837e+01, threshold=4.954e+01, percent-clipped=0.0 2024-08-19 13:18:05,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4481390.0, ans=0.125 2024-08-19 13:18:06,613 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3550, loss[loss=0.1096, beats_loss=0.009689, ecapa_loss=9.667e-05, whisper_loss=0.09899, over 18030.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.000142, whisper_loss=0.0913, over 3940635.00 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:18:10,650 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 25 from Vox, 44 fro AS 2024-08-19 13:18:17,987 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 26 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 13:18:19,769 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 28 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 13:18:25,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4481490.0, ans=0.125 2024-08-19 13:18:42,119 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 25 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 13:18:47,495 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-08-19 13:19:04,745 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 31 from LS+wenet, 14 from Vox, 25 fro AS 2024-08-19 13:19:05,590 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.77 vs. limit=10.0 2024-08-19 13:19:17,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2024-08-19 13:19:25,141 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3600, loss[loss=0.09621, beats_loss=0.009491, ecapa_loss=0.000136, whisper_loss=0.08536, over 19583.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.09125, over 3943646.81 frames. ], batch size: 79, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:19:25,691 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4481890.0, ans=0.5 2024-08-19 13:19:44,816 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 23 from LS+wenet, 16 from Vox, 30 fro AS 2024-08-19 13:19:49,555 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4481990.0, ans=0.0 2024-08-19 13:19:59,895 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-08-19 13:20:04,067 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 24 from LS+wenet, 26 from Vox, 42 fro AS 2024-08-19 13:20:34,079 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.650e+01 2.290e+01 2.482e+01 2.749e+01 4.098e+01, threshold=4.965e+01, percent-clipped=0.0 2024-08-19 13:20:35,955 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 16 from Vox, 24 fro AS 2024-08-19 13:20:42,088 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4482290.0, ans=0.125 2024-08-19 13:20:44,711 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3650, loss[loss=0.09056, beats_loss=0.009609, ecapa_loss=0.0001664, whisper_loss=0.07929, over 14952.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001408, whisper_loss=0.09064, over 3881153.05 frames. ], batch size: 58, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:20:44,923 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 20 from LS+wenet, 24 from Vox, 31 fro AS 2024-08-19 13:21:20,027 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 23 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 13:21:39,032 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=4482690.0, ans=10.0 2024-08-19 13:21:45,355 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.54 vs. limit=15.0 2024-08-19 13:22:00,028 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3700, loss[loss=0.1011, beats_loss=0.01194, ecapa_loss=0.0001343, whisper_loss=0.08778, over 22490.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01046, ecapa_loss=0.0001415, whisper_loss=0.09002, over 3871202.44 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:22:08,861 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 32 from Vox, 33 fro AS 2024-08-19 13:22:32,548 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=22.5 2024-08-19 13:22:35,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 13:22:35,236 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4483090.0, ans=0.0 2024-08-19 13:22:43,181 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 15 from LS+wenet, 21 from Vox, 32 fro AS 2024-08-19 13:22:50,855 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 23 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 13:23:05,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4483290.0, ans=0.1 2024-08-19 13:23:09,235 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.337e+01 2.650e+01 3.046e+01 1.477e+02, threshold=5.301e+01, percent-clipped=3.0 2024-08-19 13:23:09,738 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4483290.0, ans=0.125 2024-08-19 13:23:11,488 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4483290.0, ans=0.125 2024-08-19 13:23:11,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4483290.0, ans=0.0 2024-08-19 13:23:20,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3750, loss[loss=0.1015, beats_loss=0.01142, ecapa_loss=0.0001183, whisper_loss=0.08887, over 15837.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001425, whisper_loss=0.09042, over 3850333.53 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:23:21,008 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 26 from Vox, 31 fro AS 2024-08-19 13:23:41,828 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 22 from Vox, 44 fro AS 2024-08-19 13:24:02,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4483590.0, ans=0.1 2024-08-19 13:24:04,601 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08629204332828522, model_norm_threshold=53.00765609741211 2024-08-19 13:24:04,768 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.31, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.164e+05, grad_sumsq=1.113e+07, orig_rms_sq=1.046e-02 2024-08-19 13:24:05,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4483690.0, ans=0.125 2024-08-19 13:24:05,623 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=12.0 2024-08-19 13:24:19,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=4483790.0, ans=0.125 2024-08-19 13:24:24,394 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=4483790.0, ans=0.2 2024-08-19 13:24:33,687 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 17 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 13:24:36,462 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3800, loss[loss=0.1077, beats_loss=0.009635, ecapa_loss=9.894e-05, whisper_loss=0.09705, over 15425.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001416, whisper_loss=0.09039, over 3819195.01 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:24:46,160 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 13:24:46,728 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.246e+01 2024-08-19 13:25:04,314 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 13:25:10,987 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=2.994e+01 2024-08-19 13:25:12,819 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=4484090.0, ans=0.5 2024-08-19 13:25:19,440 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4484090.0, ans=0.125 2024-08-19 13:25:20,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4484090.0, ans=0.0 2024-08-19 13:25:25,381 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 30 from LS+wenet, 17 from Vox, 28 fro AS 2024-08-19 13:25:45,331 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.415e+01 2.611e+01 2.947e+01 6.143e+02, threshold=5.223e+01, percent-clipped=2.0 2024-08-19 13:25:56,283 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3850, loss[loss=0.09208, beats_loss=0.009138, ecapa_loss=0.000147, whisper_loss=0.08147, over 16395.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0104, ecapa_loss=0.0001409, whisper_loss=0.09032, over 3849408.69 frames. ], batch size: 69, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:26:06,611 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4484390.0, ans=0.125 2024-08-19 13:26:25,683 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4484490.0, ans=0.125 2024-08-19 13:26:29,072 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4484590.0, ans=0.0 2024-08-19 13:26:31,738 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 28 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 13:26:40,921 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 23 from LS+wenet, 10 from Vox, 22 fro AS 2024-08-19 13:26:44,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4484690.0, ans=0.125 2024-08-19 13:26:48,901 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 25 from Vox, 33 fro AS 2024-08-19 13:26:50,489 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4484690.0, ans=0.0 2024-08-19 13:26:58,954 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=4484790.0, ans=15.0 2024-08-19 13:27:06,397 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 13:27:09,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4484790.0, ans=0.125 2024-08-19 13:27:11,944 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3900, loss[loss=0.09718, beats_loss=0.01257, ecapa_loss=0.0001355, whisper_loss=0.08325, over 23044.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01039, ecapa_loss=0.0001411, whisper_loss=0.09038, over 3858849.37 frames. ], batch size: 95, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:27:15,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4484890.0, ans=0.125 2024-08-19 13:27:32,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4484990.0, ans=0.0 2024-08-19 13:27:55,727 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 19 from Vox, 30 fro AS 2024-08-19 13:28:02,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=4485190.0, ans=0.2 2024-08-19 13:28:02,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=4485190.0, ans=0.07 2024-08-19 13:28:05,579 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4485190.0, ans=0.0 2024-08-19 13:28:08,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=4485190.0, ans=0.0 2024-08-19 13:28:17,672 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4485290.0, ans=0.0 2024-08-19 13:28:18,587 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.733e+01 2.352e+01 2.565e+01 2.860e+01 1.728e+02, threshold=5.131e+01, percent-clipped=1.0 2024-08-19 13:28:22,641 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=4485290.0, ans=0.2 2024-08-19 13:28:24,095 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 27 from LS+wenet, 29 from Vox, 28 fro AS 2024-08-19 13:28:30,314 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 3950, loss[loss=0.1025, beats_loss=0.009208, ecapa_loss=0.0001343, whisper_loss=0.09198, over 16245.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01034, ecapa_loss=0.0001407, whisper_loss=0.09119, over 3863140.76 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:28:30,526 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 18 from Vox, 40 fro AS 2024-08-19 13:28:38,297 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 14 from Vox, 31 fro AS 2024-08-19 13:28:48,833 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2024-08-19 13:28:55,832 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 30 from LS+wenet, 10 from Vox, 37 fro AS 2024-08-19 13:28:59,164 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=4485490.0, ans=0.035 2024-08-19 13:29:08,015 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 19 from Vox, 45 fro AS 2024-08-19 13:29:08,353 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=4485590.0, ans=0.0 2024-08-19 13:29:16,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4485690.0, ans=0.125 2024-08-19 13:29:25,725 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 13:29:43,384 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 13:29:48,651 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4000, loss[loss=0.09204, beats_loss=0.00915, ecapa_loss=0.0001544, whisper_loss=0.08134, over 22262.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01031, ecapa_loss=0.0001417, whisper_loss=0.09158, over 3885423.75 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:30:03,004 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 26 from LS+wenet, 24 from Vox, 44 fro AS 2024-08-19 13:30:10,894 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 26 from LS+wenet, 26 from Vox, 43 fro AS 2024-08-19 13:30:27,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486090.0, ans=0.1 2024-08-19 13:30:43,118 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 11 from Vox, 27 fro AS 2024-08-19 13:30:51,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486290.0, ans=0.1 2024-08-19 13:30:54,624 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.618e+01 2.333e+01 2.515e+01 2.857e+01 4.559e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 13:31:02,725 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 24 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 13:31:05,650 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4050, loss[loss=0.09643, beats_loss=0.01089, ecapa_loss=0.0001434, whisper_loss=0.08411, over 20077.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01027, ecapa_loss=0.0001415, whisper_loss=0.09177, over 3881584.70 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:31:19,988 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 14 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 13:31:25,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=4486490.0, ans=0.125 2024-08-19 13:31:32,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4486490.0, ans=0.125 2024-08-19 13:32:05,039 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4486690.0, ans=0.125 2024-08-19 13:32:14,845 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4486790.0, ans=0.5 2024-08-19 13:32:14,917 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4486790.0, ans=0.1 2024-08-19 13:32:17,917 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2024-08-19 13:32:22,438 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 13:32:23,956 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4100, loss[loss=0.1158, beats_loss=0.008856, ecapa_loss=0.0001368, whisper_loss=0.1056, over 17317.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01032, ecapa_loss=0.0001417, whisper_loss=0.09106, over 3871512.77 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:32:29,148 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4486890.0, ans=0.1 2024-08-19 13:32:37,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=4486890.0, ans=0.05 2024-08-19 13:32:42,349 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4486990.0, ans=0.1 2024-08-19 13:32:42,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4486990.0, ans=0.0 2024-08-19 13:32:49,225 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=4486990.0, ans=0.125 2024-08-19 13:33:09,397 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 23 from Vox, 41 fro AS 2024-08-19 13:33:22,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4487190.0, ans=0.125 2024-08-19 13:33:26,428 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 13:33:35,988 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.996e+01 2.243e+01 2.583e+01 2.871e+01 4.368e+01, threshold=5.166e+01, percent-clipped=0.0 2024-08-19 13:33:37,706 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 33 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 13:33:43,470 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 23 from LS+wenet, 18 from Vox, 43 fro AS 2024-08-19 13:33:45,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4487390.0, ans=0.0 2024-08-19 13:33:46,110 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4150, loss[loss=0.1004, beats_loss=0.01111, ecapa_loss=0.0001158, whisper_loss=0.08814, over 18884.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01037, ecapa_loss=0.0001419, whisper_loss=0.09131, over 3876504.99 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:34:10,668 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 13:34:15,951 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 20 from LS+wenet, 16 from Vox, 21 fro AS 2024-08-19 13:34:20,637 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-08-19 13:34:21,950 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 20 from LS+wenet, 22 from Vox, 23 fro AS 2024-08-19 13:35:06,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4487790.0, ans=0.125 2024-08-19 13:35:10,386 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4487790.0, ans=0.2 2024-08-19 13:35:24,901 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4487790.0, ans=0.0 2024-08-19 13:35:27,036 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4200, loss[loss=0.1049, beats_loss=0.00958, ecapa_loss=0.000136, whisper_loss=0.09392, over 15348.00 frames. ], tot_loss[loss=0.1036, beats_loss=0.01037, ecapa_loss=0.0001418, whisper_loss=0.09179, over 3891514.25 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:35:32,701 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4487890.0, ans=0.125 2024-08-19 13:35:47,829 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.34 vs. limit=15.0 2024-08-19 13:35:55,479 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 29 from LS+wenet, 20 from Vox, 19 fro AS 2024-08-19 13:36:06,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4488090.0, ans=0.125 2024-08-19 13:36:22,700 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4488190.0, ans=0.125 2024-08-19 13:36:43,501 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.778e+01 2.376e+01 2.579e+01 2.870e+01 3.813e+01, threshold=5.158e+01, percent-clipped=0.0 2024-08-19 13:36:45,021 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 16 from LS+wenet, 16 from Vox, 44 fro AS 2024-08-19 13:36:53,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4250, loss[loss=0.1073, beats_loss=0.009363, ecapa_loss=0.0001476, whisper_loss=0.09648, over 22440.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.09129, over 3877614.43 frames. ], batch size: 92, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:37:17,100 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 33 from LS+wenet, 25 from Vox, 34 fro AS 2024-08-19 13:37:27,244 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2024-08-19 13:37:41,428 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 25 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 13:37:59,436 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=4488790.0, ans=0.0 2024-08-19 13:38:00,635 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 21 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 13:38:02,098 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 13:38:05,973 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=4488790.0, ans=0.0 2024-08-19 13:38:15,204 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4300, loss[loss=0.08165, beats_loss=0.01112, ecapa_loss=0.000111, whisper_loss=0.06942, over 15323.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.00014, whisper_loss=0.08953, over 3840266.51 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:38:23,120 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-08-19 13:38:42,709 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 32 from LS+wenet, 32 from Vox, 24 fro AS 2024-08-19 13:38:53,082 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 22 from LS+wenet, 28 from Vox, 33 fro AS 2024-08-19 13:39:15,175 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-08-19 13:39:19,706 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4489190.0, ans=0.0 2024-08-19 13:39:23,765 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.73 vs. limit=15.0 2024-08-19 13:39:36,972 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.711e+01 2.321e+01 2.560e+01 2.840e+01 5.054e+01, threshold=5.120e+01, percent-clipped=0.0 2024-08-19 13:39:49,756 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4350, loss[loss=0.1039, beats_loss=0.01014, ecapa_loss=0.0001541, whisper_loss=0.09222, over 22487.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01039, ecapa_loss=0.0001416, whisper_loss=0.08957, over 3849467.64 frames. ], batch size: 95, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:39:52,439 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4489390.0, ans=0.1 2024-08-19 13:39:55,261 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 28 from LS+wenet, 19 from Vox, 26 fro AS 2024-08-19 13:39:57,217 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 35 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 13:40:46,047 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 16 from LS+wenet, 11 from Vox, 28 fro AS 2024-08-19 13:40:52,603 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-19 13:40:56,539 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-08-19 13:41:05,085 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 28 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 13:41:18,873 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 36 from LS+wenet, 13 from Vox, 37 fro AS 2024-08-19 13:41:22,322 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4400, loss[loss=0.1072, beats_loss=0.009336, ecapa_loss=0.0001163, whisper_loss=0.0967, over 21266.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01038, ecapa_loss=0.0001403, whisper_loss=0.08969, over 3826187.62 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:42:11,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4490090.0, ans=0.125 2024-08-19 13:42:15,474 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=4490090.0, ans=0.2 2024-08-19 13:42:21,252 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 33 from LS+wenet, 12 from Vox, 32 fro AS 2024-08-19 13:42:21,480 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4490190.0, ans=0.0 2024-08-19 13:42:26,257 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.53 vs. limit=12.0 2024-08-19 13:42:29,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.89 vs. limit=22.5 2024-08-19 13:42:41,098 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4490290.0, ans=0.125 2024-08-19 13:42:45,469 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.746e+01 2.324e+01 2.485e+01 2.810e+01 3.864e+01, threshold=4.971e+01, percent-clipped=0.0 2024-08-19 13:42:49,395 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-08-19 13:43:00,440 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4450, loss[loss=0.09859, beats_loss=0.009289, ecapa_loss=0.0001508, whisper_loss=0.08779, over 20479.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01036, ecapa_loss=0.0001405, whisper_loss=0.09022, over 3826746.06 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:43:41,258 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=22.5 2024-08-19 13:43:53,446 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 24 from Vox, 27 fro AS 2024-08-19 13:44:53,135 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4500, loss[loss=0.08222, beats_loss=0.01111, ecapa_loss=0.0001459, whisper_loss=0.06965, over 15174.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01036, ecapa_loss=0.0001408, whisper_loss=0.09006, over 3850323.46 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:44:54,148 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 28 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 13:45:00,789 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 13:45:02,991 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4490890.0, ans=0.125 2024-08-19 13:45:06,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4490890.0, ans=10.0 2024-08-19 13:45:08,608 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 21 from Vox, 22 fro AS 2024-08-19 13:45:29,081 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 26 from Vox, 38 fro AS 2024-08-19 13:45:46,941 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 23 from LS+wenet, 27 from Vox, 37 fro AS 2024-08-19 13:46:15,448 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=4491290.0, ans=0.2 2024-08-19 13:46:22,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.756e+01 2.269e+01 2.513e+01 2.857e+01 4.975e+01, threshold=5.026e+01, percent-clipped=1.0 2024-08-19 13:46:23,370 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 28 from Vox, 40 fro AS 2024-08-19 13:46:25,789 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4491290.0, ans=0.125 2024-08-19 13:46:37,958 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4550, loss[loss=0.1049, beats_loss=0.01052, ecapa_loss=0.0001459, whisper_loss=0.09293, over 23167.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001418, whisper_loss=0.0901, over 3886689.69 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:47:06,325 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 22 from Vox, 41 fro AS 2024-08-19 13:47:06,523 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4491490.0, ans=0.125 2024-08-19 13:47:22,858 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4491590.0, ans=0.125 2024-08-19 13:47:26,961 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 29 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 13:47:27,137 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4491590.0, ans=0.1 2024-08-19 13:47:30,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4491590.0, ans=0.125 2024-08-19 13:47:33,036 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=15.0 2024-08-19 13:47:39,629 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4491690.0, ans=0.125 2024-08-19 13:47:39,642 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4491690.0, ans=0.125 2024-08-19 13:47:57,159 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 31 from LS+wenet, 20 from Vox, 22 fro AS 2024-08-19 13:48:04,262 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4491790.0, ans=0.0 2024-08-19 13:48:11,589 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4600, loss[loss=0.09484, beats_loss=0.009783, ecapa_loss=0.0001128, whisper_loss=0.08392, over 16563.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01034, ecapa_loss=0.0001418, whisper_loss=0.09019, over 3900930.85 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:48:44,297 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 30 from LS+wenet, 21 from Vox, 36 fro AS 2024-08-19 13:48:55,922 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2024-08-19 13:49:19,493 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 13:49:25,678 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 17 from Vox, 43 fro AS 2024-08-19 13:49:29,703 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4492290.0, ans=0.0 2024-08-19 13:49:31,930 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.899e+01 2.304e+01 2.544e+01 2.901e+01 6.121e+01, threshold=5.088e+01, percent-clipped=1.0 2024-08-19 13:49:43,897 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4650, loss[loss=0.1044, beats_loss=0.01006, ecapa_loss=0.0001389, whisper_loss=0.09297, over 19708.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01039, ecapa_loss=0.0001408, whisper_loss=0.08988, over 3902362.10 frames. ], batch size: 80, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:49:52,947 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 21 from LS+wenet, 31 from Vox, 35 fro AS 2024-08-19 13:49:56,623 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4492390.0, ans=0.125 2024-08-19 13:49:56,626 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4492390.0, ans=0.2 2024-08-19 13:50:23,788 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 35 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 13:50:39,907 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=6.77 vs. limit=15.0 2024-08-19 13:50:57,972 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4492790.0, ans=0.0 2024-08-19 13:51:02,805 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=4492790.0, ans=0.95 2024-08-19 13:51:14,857 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4700, loss[loss=0.1347, beats_loss=0.008541, ecapa_loss=0.0001607, whisper_loss=0.1246, over 21645.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.0104, ecapa_loss=0.0001411, whisper_loss=0.09048, over 3911216.26 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:51:22,630 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 32 from LS+wenet, 17 from Vox, 40 fro AS 2024-08-19 13:51:22,836 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4492890.0, ans=0.0 2024-08-19 13:51:34,763 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 25 from LS+wenet, 17 from Vox, 29 fro AS 2024-08-19 13:51:40,231 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4492990.0, ans=0.0 2024-08-19 13:51:50,848 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4493090.0, ans=0.125 2024-08-19 13:52:00,769 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 25 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-19 13:52:01,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4493090.0, ans=0.0 2024-08-19 13:52:03,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=4493090.0, ans=0.0 2024-08-19 13:52:07,114 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2024-08-19 13:52:07,347 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-08-19 13:52:15,596 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4493190.0, ans=0.125 2024-08-19 13:52:17,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4493190.0, ans=0.0 2024-08-19 13:52:24,441 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=4493290.0, ans=0.0 2024-08-19 13:52:30,133 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.714e+01 2.412e+01 2.639e+01 2.968e+01 4.627e+01, threshold=5.277e+01, percent-clipped=0.0 2024-08-19 13:52:42,730 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4750, loss[loss=0.1077, beats_loss=0.01016, ecapa_loss=0.0001532, whisper_loss=0.09596, over 17241.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01042, ecapa_loss=0.0001403, whisper_loss=0.09018, over 3891205.48 frames. ], batch size: 70, lr: 1.98e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 13:52:50,818 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=4493390.0, ans=0.025 2024-08-19 13:53:06,061 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 28 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 13:53:08,968 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 13:53:17,170 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=4493590.0, ans=0.125 2024-08-19 13:53:41,906 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4493690.0, ans=0.125 2024-08-19 13:53:41,909 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4493690.0, ans=0.125 2024-08-19 13:53:43,089 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 20 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 13:53:49,725 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=4493690.0, ans=0.2 2024-08-19 13:53:56,706 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=15.0 2024-08-19 13:54:14,327 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4800, loss[loss=0.07815, beats_loss=0.009701, ecapa_loss=0.0001565, whisper_loss=0.06688, over 15828.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01047, ecapa_loss=0.0001409, whisper_loss=0.08955, over 3894111.42 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:54:19,077 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.037e+05 2024-08-19 13:54:22,092 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 16 from LS+wenet, 23 from Vox, 27 fro AS 2024-08-19 13:54:27,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4493890.0, ans=0.125 2024-08-19 13:54:34,515 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=15.0 2024-08-19 13:54:35,779 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=4493990.0, ans=0.2 2024-08-19 13:54:40,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=4493990.0, ans=0.2 2024-08-19 13:54:49,190 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4494090.0, ans=0.2 2024-08-19 13:55:26,737 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.873e+01 2.328e+01 2.534e+01 2.852e+01 3.872e+01, threshold=5.068e+01, percent-clipped=0.0 2024-08-19 13:55:35,219 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4494290.0, ans=0.1 2024-08-19 13:55:38,094 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4850, loss[loss=0.07752, beats_loss=0.01282, ecapa_loss=9.855e-05, whisper_loss=0.06371, over 14139.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01049, ecapa_loss=0.0001419, whisper_loss=0.08943, over 3868764.43 frames. ], batch size: 53, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:55:48,879 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=12.0 2024-08-19 13:56:10,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4494590.0, ans=0.0 2024-08-19 13:56:16,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4494590.0, ans=0.0 2024-08-19 13:56:23,746 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4494590.0, ans=0.2 2024-08-19 13:56:25,411 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4494590.0, ans=0.1 2024-08-19 13:56:28,286 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 17 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 13:56:33,544 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 21 from LS+wenet, 23 from Vox, 26 fro AS 2024-08-19 13:57:03,360 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4900, loss[loss=0.1075, beats_loss=0.009612, ecapa_loss=0.0001505, whisper_loss=0.0964, over 17217.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01042, ecapa_loss=0.0001423, whisper_loss=0.09009, over 3856297.48 frames. ], batch size: 71, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:57:04,002 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 12 from Vox, 23 fro AS 2024-08-19 13:57:07,537 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 33 from Vox, 29 fro AS 2024-08-19 13:57:07,811 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4494890.0, ans=0.1 2024-08-19 13:57:09,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4494890.0, ans=0.0 2024-08-19 13:57:14,226 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4494890.0, ans=0.0 2024-08-19 13:57:25,831 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 13 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 13:57:30,817 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 22 from LS+wenet, 24 from Vox, 34 fro AS 2024-08-19 13:57:58,365 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 32 from LS+wenet, 28 from Vox, 35 fro AS 2024-08-19 13:58:01,075 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 15 from LS+wenet, 24 from Vox, 26 fro AS 2024-08-19 13:58:10,882 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4495290.0, ans=0.125 2024-08-19 13:58:14,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=4495290.0, ans=0.0 2024-08-19 13:58:15,437 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.052e+01 2.360e+01 2.527e+01 2.756e+01 4.531e+01, threshold=5.055e+01, percent-clipped=0.0 2024-08-19 13:58:16,339 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4495290.0, ans=0.1 2024-08-19 13:58:19,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4495290.0, ans=0.1 2024-08-19 13:58:20,182 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 17 from Vox, 25 fro AS 2024-08-19 13:58:25,263 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 4950, loss[loss=0.1086, beats_loss=0.01041, ecapa_loss=0.0001298, whisper_loss=0.09685, over 16024.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01036, ecapa_loss=0.0001431, whisper_loss=0.09011, over 3818737.04 frames. ], batch size: 60, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:58:25,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4495390.0, ans=0.0 2024-08-19 13:58:35,449 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 26 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 13:58:43,434 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 31 from LS+wenet, 31 from Vox, 31 fro AS 2024-08-19 13:58:49,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4495490.0, ans=0.125 2024-08-19 13:59:02,038 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-08-19 13:59:04,876 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4495590.0, ans=0.0 2024-08-19 13:59:06,175 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 21 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 13:59:06,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4495590.0, ans=0.0 2024-08-19 13:59:18,860 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4495690.0, ans=0.07 2024-08-19 13:59:31,562 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 17 from Vox, 30 fro AS 2024-08-19 13:59:38,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4495790.0, ans=0.0 2024-08-19 13:59:40,153 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4495790.0, ans=0.0 2024-08-19 13:59:49,056 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5000, loss[loss=0.09103, beats_loss=0.01277, ecapa_loss=0.0001634, whisper_loss=0.07663, over 17760.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01028, ecapa_loss=0.0001433, whisper_loss=0.09017, over 3781403.45 frames. ], batch size: 75, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 13:59:54,823 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 33 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 13:59:54,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4495890.0, ans=0.125 2024-08-19 14:00:13,485 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 14:00:13,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4495990.0, ans=0.1 2024-08-19 14:00:30,715 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4496090.0, ans=0.125 2024-08-19 14:00:46,108 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 17 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 14:01:03,575 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 30 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 14:01:07,621 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.870e+01 2.293e+01 2.592e+01 2.875e+01 4.425e+01, threshold=5.184e+01, percent-clipped=0.0 2024-08-19 14:01:17,805 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5050, loss[loss=0.1219, beats_loss=0.009088, ecapa_loss=0.0001479, whisper_loss=0.1113, over 23814.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01042, ecapa_loss=0.0001433, whisper_loss=0.09029, over 3827308.59 frames. ], batch size: 91, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:01:20,733 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 26 from LS+wenet, 22 from Vox, 38 fro AS 2024-08-19 14:01:35,926 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:01:46,076 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-19 14:01:47,549 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 15 from LS+wenet, 18 from Vox, 25 fro AS 2024-08-19 14:02:05,582 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=4496690.0, ans=0.0 2024-08-19 14:02:11,022 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4496690.0, ans=0.2 2024-08-19 14:02:41,957 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5100, loss[loss=0.0792, beats_loss=0.01243, ecapa_loss=0.0001233, whisper_loss=0.06554, over 15409.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01045, ecapa_loss=0.0001423, whisper_loss=0.09009, over 3842088.44 frames. ], batch size: 62, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:02:45,379 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 18 from Vox, 46 fro AS 2024-08-19 14:02:46,781 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 35 from LS+wenet, 19 from Vox, 34 fro AS 2024-08-19 14:02:47,177 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4496890.0, ans=0.125 2024-08-19 14:02:59,173 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.26 vs. limit=15.0 2024-08-19 14:03:06,490 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 14 from Vox, 37 fro AS 2024-08-19 14:03:24,993 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 18 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 14:03:36,979 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 28 from LS+wenet, 32 from Vox, 26 fro AS 2024-08-19 14:03:48,518 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 22 from Vox, 40 fro AS 2024-08-19 14:03:55,063 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 19 from LS+wenet, 27 from Vox, 34 fro AS 2024-08-19 14:03:56,381 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.827e+01 2.306e+01 2.543e+01 2.887e+01 4.105e+01, threshold=5.087e+01, percent-clipped=0.0 2024-08-19 14:04:03,760 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4497290.0, ans=0.2 2024-08-19 14:04:05,038 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 30 from LS+wenet, 22 from Vox, 32 fro AS 2024-08-19 14:04:06,697 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5150, loss[loss=0.105, beats_loss=0.009201, ecapa_loss=0.0001568, whisper_loss=0.09419, over 20405.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01053, ecapa_loss=0.0001406, whisper_loss=0.08991, over 3851781.64 frames. ], batch size: 84, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:04:38,089 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4497490.0, ans=0.125 2024-08-19 14:04:48,276 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 26 from LS+wenet, 23 from Vox, 31 fro AS 2024-08-19 14:05:04,572 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2024-08-19 14:05:28,791 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5200, loss[loss=0.09973, beats_loss=0.01034, ecapa_loss=0.0001492, whisper_loss=0.08791, over 22027.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01051, ecapa_loss=0.0001399, whisper_loss=0.09012, over 3865008.45 frames. ], batch size: 89, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:05:30,609 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4497890.0, ans=0.1 2024-08-19 14:05:42,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=4497890.0, ans=0.0 2024-08-19 14:05:55,348 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 29 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 14:05:58,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=4497990.0, ans=0.125 2024-08-19 14:06:01,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=4498090.0, ans=0.125 2024-08-19 14:06:25,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4498190.0, ans=0.125 2024-08-19 14:06:42,506 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.701e+01 2.299e+01 2.552e+01 2.822e+01 3.690e+01, threshold=5.104e+01, percent-clipped=0.0 2024-08-19 14:06:48,789 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 14:06:52,373 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5250, loss[loss=0.09907, beats_loss=0.008629, ecapa_loss=0.0001517, whisper_loss=0.08892, over 16554.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01052, ecapa_loss=0.000139, whisper_loss=0.08907, over 3834570.34 frames. ], batch size: 64, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:06:58,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.71 vs. limit=12.0 2024-08-19 14:07:07,877 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 14:07:10,003 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 20 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 14:07:13,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=4498490.0, ans=0.2 2024-08-19 14:07:24,550 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4498490.0, ans=0.025 2024-08-19 14:08:07,935 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4498790.0, ans=0.125 2024-08-19 14:08:16,773 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 20 from LS+wenet, 9 from Vox, 26 fro AS 2024-08-19 14:08:19,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5300, loss[loss=0.1021, beats_loss=0.01135, ecapa_loss=0.0001568, whisper_loss=0.08917, over 21127.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01048, ecapa_loss=0.0001391, whisper_loss=0.08966, over 3850194.38 frames. ], batch size: 88, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:08:26,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2024-08-19 14:08:28,590 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4498890.0, ans=0.125 2024-08-19 14:09:31,259 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.981e+01 2.343e+01 2.651e+01 2.972e+01 3.624e+01, threshold=5.303e+01, percent-clipped=0.0 2024-08-19 14:09:40,834 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5350, loss[loss=0.1054, beats_loss=0.009653, ecapa_loss=0.0001555, whisper_loss=0.09416, over 19294.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.0105, ecapa_loss=0.0001392, whisper_loss=0.08939, over 3870665.62 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:09:54,580 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 14:10:17,825 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.30 vs. limit=15.0 2024-08-19 14:10:24,375 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 19 from LS+wenet, 34 from Vox, 37 fro AS 2024-08-19 14:10:52,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4499790.0, ans=0.0 2024-08-19 14:10:53,569 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 21 from Vox, 31 fro AS 2024-08-19 14:10:55,744 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4499790.0, ans=0.0 2024-08-19 14:11:03,306 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=4499790.0, ans=0.2 2024-08-19 14:11:09,939 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5400, loss[loss=0.08986, beats_loss=0.01076, ecapa_loss=0.0001194, whisper_loss=0.0779, over 22726.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.0105, ecapa_loss=0.0001394, whisper_loss=0.08887, over 3870178.67 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:11:13,704 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4499890.0, ans=0.0 2024-08-19 14:11:39,959 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=22.5 2024-08-19 14:11:44,638 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 14:12:06,850 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4500190.0, ans=0.125 2024-08-19 14:12:06,893 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4500190.0, ans=0.125 2024-08-19 14:12:19,037 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 16 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 14:12:21,858 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-08-19 14:12:26,262 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.944e+01 2.275e+01 2.501e+01 2.873e+01 4.130e+01, threshold=5.001e+01, percent-clipped=0.0 2024-08-19 14:12:26,441 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 27 from LS+wenet, 17 from Vox, 35 fro AS 2024-08-19 14:12:32,278 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4500290.0, ans=0.125 2024-08-19 14:12:35,994 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4500390.0, ans=0.125 2024-08-19 14:12:36,752 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5450, loss[loss=0.096, beats_loss=0.01044, ecapa_loss=0.0001651, whisper_loss=0.08391, over 18043.00 frames. ], tot_loss[loss=0.1012, beats_loss=0.01045, ecapa_loss=0.0001398, whisper_loss=0.08937, over 3874835.69 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:12:51,036 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 14:12:55,905 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 14 from Vox, 32 fro AS 2024-08-19 14:12:56,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4500490.0, ans=0.0 2024-08-19 14:13:12,477 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 29 from LS+wenet, 20 from Vox, 48 fro AS 2024-08-19 14:13:17,069 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4500590.0, ans=0.5 2024-08-19 14:13:18,292 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 19 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 14:13:20,641 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 25 from LS+wenet, 25 from Vox, 38 fro AS 2024-08-19 14:13:29,611 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.10 vs. limit=10.0 2024-08-19 14:13:52,786 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4500790.0, ans=0.0 2024-08-19 14:13:58,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4500790.0, ans=0.0 2024-08-19 14:14:07,494 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5500, loss[loss=0.09727, beats_loss=0.0113, ecapa_loss=0.0001557, whisper_loss=0.08442, over 22388.00 frames. ], tot_loss[loss=0.1009, beats_loss=0.01047, ecapa_loss=0.0001395, whisper_loss=0.089, over 3855498.89 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:14:08,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4500890.0, ans=0.125 2024-08-19 14:14:32,486 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.10 vs. limit=15.0 2024-08-19 14:14:54,899 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4501090.0, ans=0.125 2024-08-19 14:15:18,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4501190.0, ans=0.035 2024-08-19 14:15:21,283 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 37 from LS+wenet, 20 from Vox, 34 fro AS 2024-08-19 14:15:29,747 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.891e+01 2.304e+01 2.520e+01 2.871e+01 1.187e+02, threshold=5.040e+01, percent-clipped=1.0 2024-08-19 14:15:32,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4501290.0, ans=0.1 2024-08-19 14:15:42,781 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5550, loss[loss=0.1145, beats_loss=0.008799, ecapa_loss=0.0001344, whisper_loss=0.1044, over 19306.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01043, ecapa_loss=0.0001399, whisper_loss=0.08996, over 3882052.57 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:15:51,379 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.70 vs. limit=15.0 2024-08-19 14:15:54,038 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4501390.0, ans=0.125 2024-08-19 14:16:28,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4501590.0, ans=0.125 2024-08-19 14:16:49,176 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 14:17:06,768 INFO [train_multi_KD3.py:844] (3/4) A total of 54 cuts. 19 from LS+wenet, 16 from Vox, 19 fro AS 2024-08-19 14:17:08,242 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=4501790.0, ans=0.05 2024-08-19 14:17:17,871 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5600, loss[loss=0.1031, beats_loss=0.01145, ecapa_loss=0.0001076, whisper_loss=0.09061, over 23149.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01044, ecapa_loss=0.0001405, whisper_loss=0.08991, over 3891690.25 frames. ], batch size: 87, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:17:25,634 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4501890.0, ans=0.95 2024-08-19 14:17:42,666 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 19 from LS+wenet, 17 from Vox, 32 fro AS 2024-08-19 14:17:53,507 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.04 vs. limit=15.0 2024-08-19 14:17:55,171 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 28 from LS+wenet, 26 from Vox, 29 fro AS 2024-08-19 14:18:24,767 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4502190.0, ans=0.125 2024-08-19 14:18:39,441 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.342e+01 2.515e+01 2.771e+01 5.960e+01, threshold=5.030e+01, percent-clipped=1.0 2024-08-19 14:18:49,859 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5650, loss[loss=0.09671, beats_loss=0.008547, ecapa_loss=0.0001734, whisper_loss=0.08643, over 16380.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001416, whisper_loss=0.09045, over 3880569.22 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:18:59,253 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4502390.0, ans=0.125 2024-08-19 14:19:04,561 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4502490.0, ans=0.1 2024-08-19 14:19:16,819 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 14:19:47,999 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 14:19:53,530 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 17 from Vox, 47 fro AS 2024-08-19 14:20:23,914 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4502790.0, ans=0.125 2024-08-19 14:20:38,416 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4502790.0, ans=0.125 2024-08-19 14:20:48,575 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5700, loss[loss=0.08919, beats_loss=0.01357, ecapa_loss=0.0001244, whisper_loss=0.07438, over 21866.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01046, ecapa_loss=0.0001398, whisper_loss=0.09056, over 3921089.62 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:21:13,764 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.83 vs. limit=15.0 2024-08-19 14:21:18,353 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 31 from Vox, 33 fro AS 2024-08-19 14:21:42,296 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 17 from Vox, 24 fro AS 2024-08-19 14:21:43,393 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=4503090.0, ans=0.0 2024-08-19 14:22:05,882 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 14 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 14:22:47,176 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.823e+01 2.393e+01 2.644e+01 2.920e+01 4.310e+01, threshold=5.288e+01, percent-clipped=0.0 2024-08-19 14:23:03,905 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5750, loss[loss=0.08697, beats_loss=0.01191, ecapa_loss=0.0001304, whisper_loss=0.07376, over 15956.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01041, ecapa_loss=0.000141, whisper_loss=0.09043, over 3914220.97 frames. ], batch size: 66, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:23:35,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=15.0 2024-08-19 14:24:07,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4503590.0, ans=0.125 2024-08-19 14:24:12,976 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 14:24:37,320 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4503690.0, ans=0.0 2024-08-19 14:24:37,331 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4503690.0, ans=0.1 2024-08-19 14:24:37,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4503690.0, ans=0.1 2024-08-19 14:25:08,749 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 21 from LS+wenet, 16 from Vox, 36 fro AS 2024-08-19 14:25:15,079 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 23 from LS+wenet, 21 from Vox, 29 fro AS 2024-08-19 14:25:15,868 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5800, loss[loss=0.1076, beats_loss=0.009117, ecapa_loss=0.0001592, whisper_loss=0.09691, over 17998.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.0904, over 3927375.22 frames. ], batch size: 73, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:25:22,060 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 26 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 14:25:23,760 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 13 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 14:25:44,192 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.35 vs. limit=15.0 2024-08-19 14:26:25,134 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 27 from LS+wenet, 21 from Vox, 20 fro AS 2024-08-19 14:26:35,461 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.790e+01 2.385e+01 2.563e+01 2.878e+01 9.355e+01, threshold=5.127e+01, percent-clipped=2.0 2024-08-19 14:26:38,677 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4504290.0, ans=0.0 2024-08-19 14:26:47,228 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5850, loss[loss=0.0955, beats_loss=0.01155, ecapa_loss=0.0001349, whisper_loss=0.0826, over 19954.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01048, ecapa_loss=0.0001406, whisper_loss=0.0895, over 3913610.35 frames. ], batch size: 81, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:27:06,508 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4504490.0, ans=0.0 2024-08-19 14:27:26,475 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4504490.0, ans=0.0 2024-08-19 14:27:28,793 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 18 from LS+wenet, 15 from Vox, 43 fro AS 2024-08-19 14:27:46,908 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 18 from LS+wenet, 25 from Vox, 28 fro AS 2024-08-19 14:27:53,502 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4504690.0, ans=0.125 2024-08-19 14:27:54,799 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 16 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 14:27:55,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4504690.0, ans=0.1 2024-08-19 14:28:27,525 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4504890.0, ans=0.125 2024-08-19 14:28:28,253 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5900, loss[loss=0.08146, beats_loss=0.01341, ecapa_loss=0.0001394, whisper_loss=0.06666, over 14895.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001411, whisper_loss=0.08956, over 3930403.12 frames. ], batch size: 61, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:28:33,060 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4504890.0, ans=0.0 2024-08-19 14:28:35,613 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 14:29:00,934 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2024-08-19 14:29:14,653 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=4505090.0, ans=0.0 2024-08-19 14:29:36,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=4505190.0, ans=0.0 2024-08-19 14:29:43,077 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-08-19 14:29:51,846 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.828e+01 2.307e+01 2.493e+01 2.853e+01 3.698e+01, threshold=4.986e+01, percent-clipped=0.0 2024-08-19 14:30:00,129 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-08-19 14:30:02,190 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 5950, loss[loss=0.08569, beats_loss=0.01259, ecapa_loss=0.0001244, whisper_loss=0.07186, over 20763.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01057, ecapa_loss=0.0001414, whisper_loss=0.08903, over 3940078.70 frames. ], batch size: 83, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:30:07,143 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4505390.0, ans=0.125 2024-08-19 14:30:37,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4505490.0, ans=0.125 2024-08-19 14:30:55,890 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4505590.0, ans=0.0 2024-08-19 14:31:14,024 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-08-19 14:31:24,051 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4505790.0, ans=0.1 2024-08-19 14:31:27,934 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4505790.0, ans=0.125 2024-08-19 14:31:29,984 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4505790.0, ans=0.125 2024-08-19 14:31:42,948 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6000, loss[loss=0.09769, beats_loss=0.00733, ecapa_loss=0.0001821, whisper_loss=0.08854, over 20263.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01054, ecapa_loss=0.0001415, whisper_loss=0.08968, over 3935634.07 frames. ], batch size: 82, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:31:42,948 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 14:32:32,866 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on ASR_libri: loss=0.2517, beats_loss=0, ecapa_loss=0.0005139, whisper_loss=0.2466, over 922467.00 frames. 2024-08-19 14:32:50,709 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.003959, beats_loss=0, ecapa_loss=0.0003959, whisper_loss=0, over 939242.00 frames. 2024-08-19 14:34:38,988 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on AT_audioset: loss=0.02302, beats_loss=0.02302, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 14:34:38,992 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 14:34:41,814 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.00 vs. limit=22.5 2024-08-19 14:34:45,522 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2024-08-19 14:35:04,842 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 18 from Vox, 42 fro AS 2024-08-19 14:35:43,804 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2024-08-19 14:35:45,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=4506190.0, ans=0.02 2024-08-19 14:35:54,216 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 27 from LS+wenet, 22 from Vox, 36 fro AS 2024-08-19 14:35:55,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.888e+01 2.316e+01 2.553e+01 2.863e+01 1.087e+02, threshold=5.107e+01, percent-clipped=1.0 2024-08-19 14:36:01,157 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2024-08-19 14:36:07,612 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6050, loss[loss=0.1084, beats_loss=0.01183, ecapa_loss=0.0001583, whisper_loss=0.09503, over 18714.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01054, ecapa_loss=0.0001407, whisper_loss=0.09028, over 3947742.61 frames. ], batch size: 77, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:36:08,010 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 13 from Vox, 29 fro AS 2024-08-19 14:36:20,150 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 23 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 14:36:32,541 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=4506490.0, ans=0.0 2024-08-19 14:36:41,314 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 26 from LS+wenet, 14 from Vox, 26 fro AS 2024-08-19 14:36:41,600 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=4506490.0, ans=0.025 2024-08-19 14:36:56,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.92 vs. limit=15.0 2024-08-19 14:37:09,912 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4506690.0, ans=0.0 2024-08-19 14:37:42,763 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6100, loss[loss=0.08689, beats_loss=0.01373, ecapa_loss=0.0001086, whisper_loss=0.07207, over 16072.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0106, ecapa_loss=0.0001417, whisper_loss=0.08991, over 3927630.42 frames. ], batch size: 65, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:37:43,997 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=4506890.0, ans=0.125 2024-08-19 14:37:50,633 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-08-19 14:37:59,579 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 30 from LS+wenet, 26 from Vox, 33 fro AS 2024-08-19 14:38:12,569 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 20 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 14:38:20,156 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 15 from LS+wenet, 17 from Vox, 27 fro AS 2024-08-19 14:38:22,066 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=4507090.0, ans=0.0 2024-08-19 14:38:35,717 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4507190.0, ans=0.125 2024-08-19 14:38:58,026 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.787e+01 2.243e+01 2.514e+01 2.857e+01 4.099e+01, threshold=5.029e+01, percent-clipped=0.0 2024-08-19 14:38:58,610 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 14:39:07,563 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6150, loss[loss=0.08388, beats_loss=0.01031, ecapa_loss=0.0001597, whisper_loss=0.07197, over 13415.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01057, ecapa_loss=0.0001419, whisper_loss=0.09046, over 3890604.50 frames. ], batch size: 54, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:39:10,298 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4507390.0, ans=0.015 2024-08-19 14:39:14,002 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4507390.0, ans=0.0 2024-08-19 14:39:18,697 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4507390.0, ans=0.1 2024-08-19 14:39:25,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=4507490.0, ans=22.5 2024-08-19 14:39:33,801 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 27 from LS+wenet, 19 from Vox, 35 fro AS 2024-08-19 14:39:38,233 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-08-19 14:39:41,862 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4507490.0, ans=0.125 2024-08-19 14:39:48,985 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 14 from Vox, 27 fro AS 2024-08-19 14:39:52,705 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 36 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 14:40:15,313 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 30 from LS+wenet, 30 from Vox, 33 fro AS 2024-08-19 14:40:26,704 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.68 vs. limit=22.5 2024-08-19 14:40:33,936 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4507790.0, ans=0.125 2024-08-19 14:40:38,439 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6200, loss[loss=0.09926, beats_loss=0.01365, ecapa_loss=0.0001031, whisper_loss=0.08458, over 22097.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01048, ecapa_loss=0.0001416, whisper_loss=0.09093, over 3888211.13 frames. ], batch size: 86, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:40:40,544 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4507890.0, ans=0.0 2024-08-19 14:41:40,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4508190.0, ans=0.0 2024-08-19 14:41:43,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4508190.0, ans=0.0 2024-08-19 14:42:00,966 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.757e+01 2.259e+01 2.437e+01 2.769e+01 4.793e+01, threshold=4.873e+01, percent-clipped=0.0 2024-08-19 14:42:08,588 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 27 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 14:42:10,025 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6250, loss[loss=0.1107, beats_loss=0.009381, ecapa_loss=0.0001478, whisper_loss=0.09988, over 19692.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.09075, over 3874684.92 frames. ], batch size: 76, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:42:13,065 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 24 from Vox, 39 fro AS 2024-08-19 14:42:26,256 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4508490.0, ans=0.2 2024-08-19 14:42:37,378 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-08-19 14:42:38,152 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4508590.0, ans=0.125 2024-08-19 14:42:50,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4508690.0, ans=0.0 2024-08-19 14:43:01,824 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4508690.0, ans=0.125 2024-08-19 14:43:04,662 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4508790.0, ans=0.125 2024-08-19 14:43:04,675 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4508790.0, ans=0.0 2024-08-19 14:43:18,928 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 14:43:20,367 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6300, loss[loss=0.1019, beats_loss=0.01058, ecapa_loss=0.0001673, whisper_loss=0.08963, over 16634.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01038, ecapa_loss=0.000141, whisper_loss=0.09065, over 3886461.20 frames. ], batch size: 68, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:43:20,856 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4508890.0, ans=0.5 2024-08-19 14:43:22,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=4508890.0, ans=0.0 2024-08-19 14:43:24,624 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 25 from Vox, 39 fro AS 2024-08-19 14:43:26,663 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2024-08-19 14:43:43,770 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=4508990.0, ans=0.125 2024-08-19 14:44:09,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4509190.0, ans=0.125 2024-08-19 14:44:11,096 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4509190.0, ans=0.125 2024-08-19 14:44:23,698 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.807e+01 2.387e+01 2.658e+01 3.235e+01 4.854e+01, threshold=5.315e+01, percent-clipped=0.0 2024-08-19 14:44:32,903 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6350, loss[loss=0.08164, beats_loss=0.009517, ecapa_loss=0.0001656, whisper_loss=0.07047, over 13593.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001408, whisper_loss=0.09047, over 3879123.14 frames. ], batch size: 56, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:44:39,568 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4509390.0, ans=0.125 2024-08-19 14:44:40,841 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 20 from Vox, 37 fro AS 2024-08-19 14:44:47,523 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.66 vs. limit=6.0 2024-08-19 14:44:53,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4509490.0, ans=0.125 2024-08-19 14:44:56,485 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4509490.0, ans=0.1 2024-08-19 14:45:21,313 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4509690.0, ans=0.04949747468305833 2024-08-19 14:45:24,573 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4509690.0, ans=0.125 2024-08-19 14:45:28,430 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 13 from LS+wenet, 19 from Vox, 28 fro AS 2024-08-19 14:45:29,592 WARNING [optim.py:496] (3/4) Scaling gradients by 0.010832761414349079, model_norm_threshold=53.15283203125 2024-08-19 14:45:29,755 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.45, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.092e+07, grad_sumsq=1.044e+09, orig_rms_sq=1.046e-02 2024-08-19 14:45:41,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4509790.0, ans=0.1 2024-08-19 14:45:43,926 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6400, loss[loss=0.1091, beats_loss=0.01045, ecapa_loss=0.0001187, whisper_loss=0.09744, over 23226.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01045, ecapa_loss=0.0001396, whisper_loss=0.09086, over 3888977.85 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:45:52,596 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.94 vs. limit=6.0 2024-08-19 14:46:07,398 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4509990.0, ans=0.125 2024-08-19 14:46:12,762 INFO [train_multi_KD3.py:844] (3/4) A total of 56 cuts. 18 from LS+wenet, 16 from Vox, 22 fro AS 2024-08-19 14:46:14,171 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 26 from LS+wenet, 21 from Vox, 25 fro AS 2024-08-19 14:46:18,879 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4510090.0, ans=0.0 2024-08-19 14:46:37,252 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4510190.0, ans=0.1 2024-08-19 14:46:46,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.039e+01 2.363e+01 2.610e+01 2.887e+01 4.907e+03, threshold=5.221e+01, percent-clipped=2.0 2024-08-19 14:46:51,788 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2024-08-19 14:46:55,554 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6450, loss[loss=0.08616, beats_loss=0.01143, ecapa_loss=0.000135, whisper_loss=0.07338, over 19225.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01037, ecapa_loss=0.0001399, whisper_loss=0.09152, over 3893720.53 frames. ], batch size: 78, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:47:10,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4510490.0, ans=0.125 2024-08-19 14:47:12,940 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4510490.0, ans=0.0 2024-08-19 14:47:21,484 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 14:47:27,203 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 26 from LS+wenet, 23 from Vox, 36 fro AS 2024-08-19 14:47:28,661 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 23 from LS+wenet, 14 from Vox, 45 fro AS 2024-08-19 14:47:31,708 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=4510590.0, ans=0.125 2024-08-19 14:47:37,771 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=4510690.0, ans=10.0 2024-08-19 14:47:38,209 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2024-08-19 14:47:38,271 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.21 vs. limit=22.5 2024-08-19 14:47:43,825 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 22 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 14:48:02,816 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4510790.0, ans=0.0 2024-08-19 14:48:10,045 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6500, loss[loss=0.1243, beats_loss=0.01029, ecapa_loss=0.0001045, whisper_loss=0.113, over 23626.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01035, ecapa_loss=0.0001413, whisper_loss=0.09147, over 3887136.81 frames. ], batch size: 90, lr: 1.98e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:48:23,054 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 14:48:36,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=4510990.0, ans=0.0 2024-08-19 14:48:41,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4511090.0, ans=0.125 2024-08-19 14:48:49,212 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=4511090.0, ans=0.125 2024-08-19 14:48:54,493 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 27 from LS+wenet, 21 from Vox, 39 fro AS 2024-08-19 14:48:59,809 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4511190.0, ans=0.1 2024-08-19 14:49:13,654 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.923e+01 2.374e+01 2.606e+01 2.807e+01 1.094e+02, threshold=5.213e+01, percent-clipped=1.0 2024-08-19 14:49:23,906 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6550, loss[loss=0.1065, beats_loss=0.01038, ecapa_loss=0.0001477, whisper_loss=0.09462, over 20808.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01042, ecapa_loss=0.0001406, whisper_loss=0.09124, over 3911109.41 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:49:24,682 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.71 vs. limit=10.0 2024-08-19 14:49:33,732 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-08-19 14:49:41,702 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 21 from LS+wenet, 22 from Vox, 46 fro AS 2024-08-19 14:49:47,467 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 29 from Vox, 33 fro AS 2024-08-19 14:49:47,983 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4511490.0, ans=0.0 2024-08-19 14:49:55,942 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4511590.0, ans=0.125 2024-08-19 14:50:11,365 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4511690.0, ans=0.125 2024-08-19 14:50:26,163 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4511790.0, ans=0.1 2024-08-19 14:50:38,607 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6600, loss[loss=0.1106, beats_loss=0.01089, ecapa_loss=0.0001242, whisper_loss=0.09845, over 23139.00 frames. ], tot_loss[loss=0.1032, beats_loss=0.01041, ecapa_loss=0.0001409, whisper_loss=0.09143, over 3925400.19 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:50:40,071 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 21 from LS+wenet, 18 from Vox, 45 fro AS 2024-08-19 14:50:46,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4511890.0, ans=0.1 2024-08-19 14:50:49,447 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 17 from LS+wenet, 18 from Vox, 31 fro AS 2024-08-19 14:50:50,993 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 20 from LS+wenet, 15 from Vox, 29 fro AS 2024-08-19 14:51:04,675 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.12 vs. limit=6.0 2024-08-19 14:51:08,790 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4512090.0, ans=0.125 2024-08-19 14:51:37,228 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 27 from Vox, 20 fro AS 2024-08-19 14:51:44,339 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.972e+01 2.291e+01 2.479e+01 2.799e+01 1.189e+02, threshold=4.958e+01, percent-clipped=2.0 2024-08-19 14:51:53,301 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6650, loss[loss=0.1085, beats_loss=0.01017, ecapa_loss=0.0001637, whisper_loss=0.09664, over 21891.00 frames. ], tot_loss[loss=0.1033, beats_loss=0.01042, ecapa_loss=0.0001413, whisper_loss=0.0915, over 3939690.99 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:52:07,720 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=8.0 2024-08-19 14:52:08,730 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=10.0 2024-08-19 14:52:16,899 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 24 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 14:52:23,138 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=4512590.0, ans=0.0 2024-08-19 14:52:35,776 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 20 from LS+wenet, 13 from Vox, 25 fro AS 2024-08-19 14:53:01,779 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 25 from LS+wenet, 9 from Vox, 21 fro AS 2024-08-19 14:53:04,789 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6700, loss[loss=0.1017, beats_loss=0.01219, ecapa_loss=0.0001379, whisper_loss=0.08813, over 22915.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01042, ecapa_loss=0.0001406, whisper_loss=0.09101, over 3897807.61 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:53:04,901 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 25 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 14:53:20,553 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4512990.0, ans=0.1 2024-08-19 14:53:28,769 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.66 vs. limit=15.0 2024-08-19 14:53:33,359 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4512990.0, ans=0.125 2024-08-19 14:53:41,209 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 19 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 14:53:47,475 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 33 from LS+wenet, 16 from Vox, 37 fro AS 2024-08-19 14:54:14,030 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.894e+01 2.375e+01 2.659e+01 3.001e+01 3.941e+01, threshold=5.319e+01, percent-clipped=0.0 2024-08-19 14:54:23,298 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6750, loss[loss=0.09752, beats_loss=0.01019, ecapa_loss=0.0001602, whisper_loss=0.08572, over 20410.00 frames. ], tot_loss[loss=0.103, beats_loss=0.01041, ecapa_loss=0.0001396, whisper_loss=0.09119, over 3865727.81 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 1.4411518807585587e+17 2024-08-19 14:54:32,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=4513390.0, ans=0.125 2024-08-19 14:54:48,239 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4513490.0, ans=0.125 2024-08-19 14:54:49,418 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4513490.0, ans=0.1 2024-08-19 14:55:06,648 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=1.780e+00 2024-08-19 14:55:07,829 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4513690.0, ans=0.125 2024-08-19 14:55:27,490 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 20 from LS+wenet, 28 from Vox, 30 fro AS 2024-08-19 14:55:28,869 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6800, loss[loss=0.08807, beats_loss=0.00976, ecapa_loss=0.000167, whisper_loss=0.07664, over 18643.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001416, whisper_loss=0.09081, over 3857187.55 frames. ], batch size: 78, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:55:40,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4513990.0, ans=0.125 2024-08-19 14:55:42,390 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 31 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 14:55:45,987 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 16 from Vox, 39 fro AS 2024-08-19 14:55:46,324 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4513990.0, ans=0.125 2024-08-19 14:55:47,794 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=4513990.0, ans=0.025 2024-08-19 14:55:53,673 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4514090.0, ans=0.125 2024-08-19 14:56:07,020 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 14 from LS+wenet, 16 from Vox, 31 fro AS 2024-08-19 14:56:09,454 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 21 from LS+wenet, 20 from Vox, 21 fro AS 2024-08-19 14:56:23,585 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.785e+01 2.299e+01 2.607e+01 2.944e+01 3.489e+02, threshold=5.214e+01, percent-clipped=2.0 2024-08-19 14:56:31,350 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6850, loss[loss=0.08245, beats_loss=0.01357, ecapa_loss=0.0001282, whisper_loss=0.06759, over 16316.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01045, ecapa_loss=0.0001411, whisper_loss=0.09002, over 3834039.34 frames. ], batch size: 69, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:56:40,385 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4514390.0, ans=0.0 2024-08-19 14:56:53,332 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-08-19 14:56:54,875 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 24 from LS+wenet, 19 from Vox, 33 fro AS 2024-08-19 14:56:57,607 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4514590.0, ans=0.125 2024-08-19 14:56:59,946 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 27 from LS+wenet, 14 from Vox, 22 fro AS 2024-08-19 14:57:01,498 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4514590.0, ans=0.0 2024-08-19 14:57:01,511 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4514590.0, ans=0.0 2024-08-19 14:57:05,533 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4514590.0, ans=0.0 2024-08-19 14:57:08,875 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 22 from Vox, 47 fro AS 2024-08-19 14:57:17,901 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2024-08-19 14:57:23,987 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4514790.0, ans=0.125 2024-08-19 14:57:26,023 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 24 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 14:57:31,410 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=4514790.0, ans=0.0 2024-08-19 14:57:33,382 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6900, loss[loss=0.09729, beats_loss=0.009655, ecapa_loss=0.0001574, whisper_loss=0.08606, over 22089.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001395, whisper_loss=0.08995, over 3859464.46 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:57:36,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=4514890.0, ans=10.0 2024-08-19 14:57:42,463 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 19 from LS+wenet, 20 from Vox, 18 fro AS 2024-08-19 14:57:53,520 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=4514990.0, ans=0.5 2024-08-19 14:58:03,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4515090.0, ans=0.125 2024-08-19 14:58:18,330 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4515190.0, ans=10.0 2024-08-19 14:58:27,593 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.814e+01 2.255e+01 2.566e+01 2.840e+01 4.154e+01, threshold=5.133e+01, percent-clipped=0.0 2024-08-19 14:58:35,199 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 6950, loss[loss=0.1077, beats_loss=0.01055, ecapa_loss=0.0001483, whisper_loss=0.09567, over 20672.00 frames. ], tot_loss[loss=0.1017, beats_loss=0.01055, ecapa_loss=0.0001406, whisper_loss=0.08976, over 3858332.62 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:58:39,531 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4515390.0, ans=0.125 2024-08-19 14:58:42,534 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-08-19 14:58:44,246 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 21 from LS+wenet, 18 from Vox, 52 fro AS 2024-08-19 14:58:49,795 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2024-08-19 14:58:52,122 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.11 vs. limit=10.0 2024-08-19 14:59:15,403 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 16 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 14:59:17,842 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 29 from LS+wenet, 17 from Vox, 42 fro AS 2024-08-19 14:59:25,576 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2024-08-19 14:59:30,199 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-08-19 14:59:31,011 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 18 from LS+wenet, 19 from Vox, 24 fro AS 2024-08-19 14:59:37,385 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7000, loss[loss=0.09099, beats_loss=0.01036, ecapa_loss=0.0001548, whisper_loss=0.07909, over 18965.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01051, ecapa_loss=0.0001415, whisper_loss=0.09015, over 3848774.81 frames. ], batch size: 77, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 14:59:39,852 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 24 from LS+wenet, 27 from Vox, 33 fro AS 2024-08-19 14:59:41,370 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4515890.0, ans=0.0 2024-08-19 14:59:42,586 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=4515890.0, ans=10.0 2024-08-19 14:59:44,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4515890.0, ans=0.125 2024-08-19 14:59:47,149 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4515890.0, ans=0.2 2024-08-19 14:59:53,966 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4515990.0, ans=0.0 2024-08-19 15:00:09,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4516090.0, ans=0.1 2024-08-19 15:00:11,205 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.51 vs. limit=22.5 2024-08-19 15:00:14,808 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-08-19 15:00:30,226 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 15:00:31,228 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 2.030e+01 2.441e+01 2.633e+01 3.064e+01 9.212e+01, threshold=5.267e+01, percent-clipped=3.0 2024-08-19 15:00:33,106 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4516290.0, ans=0.125 2024-08-19 15:00:38,726 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7050, loss[loss=0.09932, beats_loss=0.009385, ecapa_loss=0.0001385, whisper_loss=0.08854, over 21050.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001432, whisper_loss=0.08991, over 3869319.71 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:00:51,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4516490.0, ans=0.1 2024-08-19 15:00:54,620 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2024-08-19 15:00:56,765 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4516490.0, ans=0.125 2024-08-19 15:01:00,341 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4516490.0, ans=0.125 2024-08-19 15:01:26,376 INFO [train_multi_KD3.py:844] (3/4) A total of 71 cuts. 22 from LS+wenet, 21 from Vox, 28 fro AS 2024-08-19 15:01:33,967 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4516790.0, ans=0.125 2024-08-19 15:01:40,653 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7100, loss[loss=0.1046, beats_loss=0.01077, ecapa_loss=0.0001216, whisper_loss=0.09262, over 20966.00 frames. ], tot_loss[loss=0.1013, beats_loss=0.01055, ecapa_loss=0.0001414, whisper_loss=0.08934, over 3858065.56 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:02:09,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4517090.0, ans=0.125 2024-08-19 15:02:22,797 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.80 vs. limit=15.0 2024-08-19 15:02:32,985 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.28 vs. limit=12.0 2024-08-19 15:02:35,776 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.673e+01 2.230e+01 2.578e+01 2.810e+01 3.581e+01, threshold=5.156e+01, percent-clipped=0.0 2024-08-19 15:02:41,381 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2024-08-19 15:02:43,322 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7150, loss[loss=0.1033, beats_loss=0.01063, ecapa_loss=0.0001429, whisper_loss=0.09125, over 16409.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01052, ecapa_loss=0.0001408, whisper_loss=0.09001, over 3888483.78 frames. ], batch size: 64, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:02:45,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4517390.0, ans=0.1 2024-08-19 15:02:48,777 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=4517390.0, ans=0.2 2024-08-19 15:02:51,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4517390.0, ans=0.125 2024-08-19 15:02:54,610 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4517490.0, ans=0.1 2024-08-19 15:02:55,923 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4517490.0, ans=0.0 2024-08-19 15:03:11,625 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=4517590.0, ans=0.2 2024-08-19 15:03:12,674 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 16 from LS+wenet, 15 from Vox, 26 fro AS 2024-08-19 15:03:25,565 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=4517690.0, ans=0.125 2024-08-19 15:03:29,400 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.79 vs. limit=22.5 2024-08-19 15:03:35,362 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=4517790.0, ans=6.0 2024-08-19 15:03:41,569 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4517790.0, ans=0.07 2024-08-19 15:03:43,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=4517790.0, ans=0.125 2024-08-19 15:03:45,991 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7200, loss[loss=0.1057, beats_loss=0.009058, ecapa_loss=0.0001311, whisper_loss=0.09534, over 23846.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001403, whisper_loss=0.0902, over 3898497.18 frames. ], batch size: 92, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:03:50,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4517890.0, ans=0.1 2024-08-19 15:04:18,153 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 32 from LS+wenet, 26 from Vox, 36 fro AS 2024-08-19 15:04:39,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.908e+01 2.333e+01 2.621e+01 2.974e+01 6.907e+01, threshold=5.242e+01, percent-clipped=0.0 2024-08-19 15:04:42,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4518290.0, ans=0.0 2024-08-19 15:04:46,967 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7250, loss[loss=0.1068, beats_loss=0.006727, ecapa_loss=0.0001792, whisper_loss=0.09833, over 17114.00 frames. ], tot_loss[loss=0.1014, beats_loss=0.01055, ecapa_loss=0.0001411, whisper_loss=0.0894, over 3897689.36 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:04:47,514 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4518390.0, ans=0.0 2024-08-19 15:04:56,487 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2024-08-19 15:05:02,873 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 14 from LS+wenet, 15 from Vox, 31 fro AS 2024-08-19 15:05:05,233 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 23 from LS+wenet, 19 from Vox, 18 fro AS 2024-08-19 15:05:05,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4518490.0, ans=0.125 2024-08-19 15:05:32,256 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 35 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 15:05:47,603 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7300, loss[loss=0.1077, beats_loss=0.009759, ecapa_loss=0.0001377, whisper_loss=0.09658, over 23014.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.0105, ecapa_loss=0.000142, whisper_loss=0.08995, over 3895454.83 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:05:53,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4518890.0, ans=0.125 2024-08-19 15:05:55,268 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 15:05:55,539 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=4518890.0, ans=0.2 2024-08-19 15:05:57,839 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=4518890.0, ans=0.125 2024-08-19 15:06:06,212 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.17 vs. limit=10.0 2024-08-19 15:06:10,318 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4518990.0, ans=0.0 2024-08-19 15:06:20,899 INFO [train_multi_KD3.py:844] (3/4) A total of 73 cuts. 22 from LS+wenet, 19 from Vox, 32 fro AS 2024-08-19 15:06:27,428 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4519190.0, ans=0.125 2024-08-19 15:06:30,107 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4519190.0, ans=0.2 2024-08-19 15:06:41,778 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.905e+01 2.312e+01 2.518e+01 2.847e+01 5.686e+01, threshold=5.035e+01, percent-clipped=2.0 2024-08-19 15:06:47,158 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4519290.0, ans=0.125 2024-08-19 15:06:49,087 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7350, loss[loss=0.1111, beats_loss=0.009967, ecapa_loss=0.0001201, whisper_loss=0.09997, over 23044.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01043, ecapa_loss=0.000142, whisper_loss=0.09052, over 3874659.28 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:06:57,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4519390.0, ans=0.1 2024-08-19 15:07:12,482 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=4519590.0, ans=0.0 2024-08-19 15:07:22,081 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4519590.0, ans=0.125 2024-08-19 15:07:27,483 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=4519690.0, ans=0.0 2024-08-19 15:07:29,775 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4519690.0, ans=0.0 2024-08-19 15:07:32,305 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=15.0 2024-08-19 15:07:33,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4519690.0, ans=0.1 2024-08-19 15:07:50,034 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7400, loss[loss=0.09528, beats_loss=0.01189, ecapa_loss=0.000143, whisper_loss=0.08196, over 19232.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01041, ecapa_loss=0.000142, whisper_loss=0.09052, over 3864349.90 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:07:50,844 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.83 vs. limit=15.0 2024-08-19 15:08:13,759 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 36 from LS+wenet, 15 from Vox, 39 fro AS 2024-08-19 15:08:17,315 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 22 from LS+wenet, 25 from Vox, 29 fro AS 2024-08-19 15:08:17,679 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4520090.0, ans=0.125 2024-08-19 15:08:18,553 INFO [train_multi_KD3.py:844] (3/4) A total of 65 cuts. 21 from LS+wenet, 16 from Vox, 28 fro AS 2024-08-19 15:08:19,623 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 29 from LS+wenet, 21 from Vox, 37 fro AS 2024-08-19 15:08:22,657 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=4520090.0, ans=0.95 2024-08-19 15:08:26,092 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 23 from LS+wenet, 21 from Vox, 44 fro AS 2024-08-19 15:08:35,836 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 15 from LS+wenet, 19 from Vox, 23 fro AS 2024-08-19 15:08:39,815 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.881e+01 2024-08-19 15:08:44,708 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 30 from LS+wenet, 19 from Vox, 39 fro AS 2024-08-19 15:08:47,145 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.604e+01 2.272e+01 2.510e+01 2.684e+01 4.218e+01, threshold=5.020e+01, percent-clipped=0.0 2024-08-19 15:08:52,222 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4520290.0, ans=0.125 2024-08-19 15:08:53,986 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7450, loss[loss=0.1118, beats_loss=0.007601, ecapa_loss=0.0001815, whisper_loss=0.1024, over 21798.00 frames. ], tot_loss[loss=0.1031, beats_loss=0.01035, ecapa_loss=0.0001431, whisper_loss=0.0913, over 3881083.10 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:09:06,266 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=4520490.0, ans=0.125 2024-08-19 15:09:19,758 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4520590.0, ans=0.125 2024-08-19 15:09:22,476 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:09:22,563 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4520590.0, ans=0.125 2024-08-19 15:09:29,820 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:09:48,207 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4520790.0, ans=0.125 2024-08-19 15:09:54,015 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 24 from LS+wenet, 22 from Vox, 48 fro AS 2024-08-19 15:09:56,182 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7500, loss[loss=0.08623, beats_loss=0.01157, ecapa_loss=0.0001377, whisper_loss=0.07328, over 15844.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01043, ecapa_loss=0.000142, whisper_loss=0.09072, over 3896966.81 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:09:59,264 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=4520890.0, ans=0.0 2024-08-19 15:10:00,965 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.46 vs. limit=15.0 2024-08-19 15:10:04,065 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4520890.0, ans=0.1 2024-08-19 15:10:25,494 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4521090.0, ans=0.0 2024-08-19 15:10:31,388 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2024-08-19 15:10:51,088 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.958e+01 2.277e+01 2.526e+01 2.958e+01 5.169e+01, threshold=5.052e+01, percent-clipped=1.0 2024-08-19 15:10:58,354 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7550, loss[loss=0.1003, beats_loss=0.01079, ecapa_loss=0.0001304, whisper_loss=0.08824, over 22851.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01043, ecapa_loss=0.0001416, whisper_loss=0.09018, over 3872761.38 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:10:59,755 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 20 from LS+wenet, 12 from Vox, 29 fro AS 2024-08-19 15:11:19,109 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 17 from LS+wenet, 30 from Vox, 30 fro AS 2024-08-19 15:11:22,759 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 18 from LS+wenet, 13 from Vox, 28 fro AS 2024-08-19 15:11:24,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4521590.0, ans=0.125 2024-08-19 15:11:26,427 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=4521590.0, ans=0.1 2024-08-19 15:11:30,437 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=4521590.0, ans=0.125 2024-08-19 15:11:30,887 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-08-19 15:11:39,871 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 21 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 15:11:40,174 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4521690.0, ans=0.125 2024-08-19 15:11:43,478 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 30 from LS+wenet, 28 from Vox, 34 fro AS 2024-08-19 15:11:48,414 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 17 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 15:11:59,533 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7600, loss[loss=0.08612, beats_loss=0.01322, ecapa_loss=0.0001457, whisper_loss=0.07144, over 21917.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.01045, ecapa_loss=0.0001415, whisper_loss=0.08965, over 3848113.92 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:12:07,476 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.09 vs. limit=15.0 2024-08-19 15:12:21,579 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.67 vs. limit=8.0 2024-08-19 15:12:30,195 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 25 from LS+wenet, 13 from Vox, 24 fro AS 2024-08-19 15:12:45,279 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4522190.0, ans=0.1 2024-08-19 15:12:46,598 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.778e-02 2024-08-19 15:12:47,642 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 36 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 15:12:54,041 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.930e+01 2.291e+01 2.555e+01 2.938e+01 1.089e+02, threshold=5.110e+01, percent-clipped=2.0 2024-08-19 15:12:56,545 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 31 from LS+wenet, 25 from Vox, 32 fro AS 2024-08-19 15:13:01,261 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7650, loss[loss=0.09652, beats_loss=0.009008, ecapa_loss=0.0001377, whisper_loss=0.08614, over 19822.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01037, ecapa_loss=0.0001417, whisper_loss=0.09033, over 3842516.94 frames. ], batch size: 79, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:13:14,183 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 31 from LS+wenet, 20 from Vox, 39 fro AS 2024-08-19 15:13:20,374 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 24 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 15:13:37,631 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=4522690.0, ans=0.04949747468305833 2024-08-19 15:13:42,049 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 28 from LS+wenet, 21 from Vox, 38 fro AS 2024-08-19 15:14:02,338 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7700, loss[loss=0.08059, beats_loss=0.01228, ecapa_loss=0.0001386, whisper_loss=0.06693, over 16406.00 frames. ], tot_loss[loss=0.102, beats_loss=0.0104, ecapa_loss=0.0001414, whisper_loss=0.09016, over 3865281.21 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:14:02,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4522890.0, ans=0.125 2024-08-19 15:14:08,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4522890.0, ans=0.1 2024-08-19 15:14:50,900 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 18 from LS+wenet, 20 from Vox, 42 fro AS 2024-08-19 15:14:55,510 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.866e+01 2.405e+01 2.630e+01 2.865e+01 8.039e+01, threshold=5.260e+01, percent-clipped=1.0 2024-08-19 15:15:00,027 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4523290.0, ans=0.1 2024-08-19 15:15:00,824 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 20 from LS+wenet, 22 from Vox, 51 fro AS 2024-08-19 15:15:03,003 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7750, loss[loss=0.09419, beats_loss=0.01255, ecapa_loss=0.0001431, whisper_loss=0.08021, over 21466.00 frames. ], tot_loss[loss=0.1011, beats_loss=0.01052, ecapa_loss=0.0001403, whisper_loss=0.08913, over 3865333.53 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:15:05,598 INFO [train_multi_KD3.py:844] (3/4) A total of 76 cuts. 20 from LS+wenet, 21 from Vox, 35 fro AS 2024-08-19 15:15:27,988 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4523590.0, ans=0.0 2024-08-19 15:15:34,217 INFO [scaling.py:1024] (3/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.79 vs. limit=5.0 2024-08-19 15:15:36,074 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4523590.0, ans=0.125 2024-08-19 15:16:03,691 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7800, loss[loss=0.105, beats_loss=0.01055, ecapa_loss=0.0001423, whisper_loss=0.09307, over 22260.00 frames. ], tot_loss[loss=0.1008, beats_loss=0.01055, ecapa_loss=0.00014, whisper_loss=0.08884, over 3870755.59 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:16:08,499 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 21 from LS+wenet, 18 from Vox, 33 fro AS 2024-08-19 15:16:15,976 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=4523990.0, ans=0.2 2024-08-19 15:16:28,594 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 31 from LS+wenet, 20 from Vox, 32 fro AS 2024-08-19 15:16:29,821 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 21 from LS+wenet, 16 from Vox, 23 fro AS 2024-08-19 15:16:31,173 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4524090.0, ans=0.1 2024-08-19 15:16:31,393 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.5 2024-08-19 15:16:32,628 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=4524090.0, ans=0.07 2024-08-19 15:16:42,463 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=4524190.0, ans=0.125 2024-08-19 15:16:43,618 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4524190.0, ans=0.125 2024-08-19 15:16:48,705 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2024-08-19 15:16:50,947 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4524290.0, ans=0.1 2024-08-19 15:16:56,328 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.665e+01 2.331e+01 2.591e+01 2.941e+01 6.755e+01, threshold=5.181e+01, percent-clipped=1.0 2024-08-19 15:17:01,622 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4524290.0, ans=0.125 2024-08-19 15:17:03,387 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7850, loss[loss=0.1036, beats_loss=0.009649, ecapa_loss=0.0001386, whisper_loss=0.09256, over 15074.00 frames. ], tot_loss[loss=0.101, beats_loss=0.01056, ecapa_loss=0.0001403, whisper_loss=0.08907, over 3839793.34 frames. ], batch size: 57, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:17:09,808 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 26 from LS+wenet, 18 from Vox, 24 fro AS 2024-08-19 15:17:18,317 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4524490.0, ans=0.125 2024-08-19 15:17:22,054 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=4524490.0, ans=0.0 2024-08-19 15:17:37,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2024-08-19 15:17:49,293 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 27 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 15:17:49,636 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4524690.0, ans=0.1 2024-08-19 15:17:55,408 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4524790.0, ans=0.5 2024-08-19 15:17:55,662 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-08-19 15:18:03,662 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7900, loss[loss=0.1012, beats_loss=0.00988, ecapa_loss=0.0001904, whisper_loss=0.08938, over 18732.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01056, ecapa_loss=0.0001408, whisper_loss=0.08963, over 3847631.21 frames. ], batch size: 76, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:18:03,807 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 15:18:04,570 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.19 vs. limit=10.0 2024-08-19 15:18:09,759 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 24 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:18:10,976 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 19 from Vox, 38 fro AS 2024-08-19 15:18:12,535 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4524890.0, ans=0.1 2024-08-19 15:18:13,624 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=4524890.0, ans=0.07 2024-08-19 15:18:26,844 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 25 from LS+wenet, 28 from Vox, 39 fro AS 2024-08-19 15:18:34,840 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=4525090.0, ans=0.0 2024-08-19 15:18:42,211 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4525190.0, ans=0.1 2024-08-19 15:18:43,369 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4525190.0, ans=0.07 2024-08-19 15:18:46,852 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4525190.0, ans=0.125 2024-08-19 15:18:49,373 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4525190.0, ans=0.125 2024-08-19 15:18:51,431 INFO [train_multi_KD3.py:844] (3/4) A total of 70 cuts. 27 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 15:18:56,257 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.850e+01 2.323e+01 2.530e+01 2.924e+01 2.485e+02, threshold=5.060e+01, percent-clipped=4.0 2024-08-19 15:19:03,379 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 7950, loss[loss=0.09373, beats_loss=0.01014, ecapa_loss=0.000136, whisper_loss=0.08223, over 17996.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.0105, ecapa_loss=0.0001417, whisper_loss=0.0902, over 3856199.25 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:19:09,682 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4525390.0, ans=0.04949747468305833 2024-08-19 15:19:13,023 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 28 from LS+wenet, 29 from Vox, 35 fro AS 2024-08-19 15:19:13,272 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4525390.0, ans=0.125 2024-08-19 15:19:16,762 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 20 from LS+wenet, 24 from Vox, 24 fro AS 2024-08-19 15:19:25,552 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4525490.0, ans=0.125 2024-08-19 15:19:31,560 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=4525590.0, ans=10.0 2024-08-19 15:19:32,941 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=4525590.0, ans=0.125 2024-08-19 15:19:36,955 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2024-08-19 15:19:38,428 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.94 vs. limit=15.0 2024-08-19 15:19:39,123 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4525690.0, ans=0.0 2024-08-19 15:19:44,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=4525690.0, ans=0.0 2024-08-19 15:19:47,274 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4525690.0, ans=0.125 2024-08-19 15:19:55,496 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=4525790.0, ans=0.0 2024-08-19 15:19:59,071 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4525790.0, ans=0.125 2024-08-19 15:20:03,579 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8000, loss[loss=0.1074, beats_loss=0.01062, ecapa_loss=0.0001442, whisper_loss=0.09534, over 15759.00 frames. ], tot_loss[loss=0.1019, beats_loss=0.01053, ecapa_loss=0.0001418, whisper_loss=0.08995, over 3834736.99 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:20:05,040 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4525890.0, ans=0.125 2024-08-19 15:20:09,287 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 26 from LS+wenet, 24 from Vox, 41 fro AS 2024-08-19 15:20:39,532 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 27 from LS+wenet, 23 from Vox, 42 fro AS 2024-08-19 15:20:55,785 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=1.356e+00 2024-08-19 15:20:56,405 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.792e+01 2.249e+01 2.539e+01 2.836e+01 4.576e+01, threshold=5.079e+01, percent-clipped=0.0 2024-08-19 15:21:02,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=4526390.0, ans=0.05 2024-08-19 15:21:03,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8050, loss[loss=0.08351, beats_loss=0.009078, ecapa_loss=0.0001973, whisper_loss=0.07246, over 13959.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01049, ecapa_loss=0.0001413, whisper_loss=0.09046, over 3853932.38 frames. ], batch size: 58, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:21:08,526 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 13 from Vox, 33 fro AS 2024-08-19 15:21:13,635 INFO [train_multi_KD3.py:844] (3/4) A total of 87 cuts. 26 from LS+wenet, 19 from Vox, 42 fro AS 2024-08-19 15:21:13,924 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4526390.0, ans=0.125 2024-08-19 15:21:52,102 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 15:22:03,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8100, loss[loss=0.08486, beats_loss=0.00977, ecapa_loss=0.0001184, whisper_loss=0.07391, over 15085.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01043, ecapa_loss=0.0001409, whisper_loss=0.09062, over 3828149.25 frames. ], batch size: 59, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:22:28,213 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4527090.0, ans=0.2 2024-08-19 15:22:35,941 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2024-08-19 15:22:47,783 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-08-19 15:22:49,908 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4527190.0, ans=0.1 2024-08-19 15:22:56,202 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.730e+01 2.370e+01 2.531e+01 2.810e+01 1.337e+02, threshold=5.062e+01, percent-clipped=2.0 2024-08-19 15:22:59,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=4527290.0, ans=0.125 2024-08-19 15:23:03,535 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8150, loss[loss=0.1002, beats_loss=0.01168, ecapa_loss=0.0001456, whisper_loss=0.08704, over 22507.00 frames. ], tot_loss[loss=0.1021, beats_loss=0.01047, ecapa_loss=0.0001401, whisper_loss=0.09023, over 3854963.43 frames. ], batch size: 93, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:23:04,102 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.16 vs. limit=15.0 2024-08-19 15:23:26,358 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4527590.0, ans=0.035 2024-08-19 15:23:32,532 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4527590.0, ans=0.1 2024-08-19 15:23:33,902 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4527590.0, ans=0.125 2024-08-19 15:23:37,310 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=4527590.0, ans=0.125 2024-08-19 15:23:59,591 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=4.703e-01 2024-08-19 15:24:03,097 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8200, loss[loss=0.1032, beats_loss=0.01056, ecapa_loss=0.0001698, whisper_loss=0.0909, over 21357.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01044, ecapa_loss=0.0001404, whisper_loss=0.09061, over 3863182.61 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:24:28,213 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.38 vs. limit=10.0 2024-08-19 15:24:31,798 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.84 vs. limit=15.0 2024-08-19 15:24:32,849 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-08-19 15:24:39,513 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=4528190.0, ans=0.125 2024-08-19 15:24:55,075 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4528290.0, ans=0.125 2024-08-19 15:24:55,842 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.900e+01 2.279e+01 2.474e+01 2.838e+01 8.024e+01, threshold=4.948e+01, percent-clipped=1.0 2024-08-19 15:25:03,234 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8250, loss[loss=0.09011, beats_loss=0.01127, ecapa_loss=0.0001321, whisper_loss=0.07752, over 18036.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01048, ecapa_loss=0.0001401, whisper_loss=0.09038, over 3870945.73 frames. ], batch size: 73, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:25:05,836 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 28 from LS+wenet, 17 from Vox, 46 fro AS 2024-08-19 15:25:20,587 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=4528490.0, ans=0.2 2024-08-19 15:25:23,015 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=4528490.0, ans=15.0 2024-08-19 15:25:24,063 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4528490.0, ans=0.1 2024-08-19 15:25:24,323 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.22 vs. limit=22.5 2024-08-19 15:25:29,859 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 16 from LS+wenet, 21 from Vox, 26 fro AS 2024-08-19 15:25:31,248 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=4528590.0, ans=0.025 2024-08-19 15:25:51,920 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=4528790.0, ans=0.125 2024-08-19 15:26:03,568 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8300, loss[loss=0.1157, beats_loss=0.008078, ecapa_loss=0.0001336, whisper_loss=0.1062, over 18095.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001395, whisper_loss=0.09031, over 3874687.29 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:26:03,712 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 23 from Vox, 34 fro AS 2024-08-19 15:26:06,196 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=4528890.0, ans=0.125 2024-08-19 15:26:16,344 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 33 from LS+wenet, 25 from Vox, 36 fro AS 2024-08-19 15:26:28,602 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=4529090.0, ans=0.125 2024-08-19 15:26:33,218 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=3.251e+01 2024-08-19 15:26:51,093 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:26:52,104 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4529290.0, ans=0.015 2024-08-19 15:26:55,592 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.910e+01 2.344e+01 2.567e+01 2.934e+01 1.286e+02, threshold=5.133e+01, percent-clipped=2.0 2024-08-19 15:26:58,458 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=4529290.0, ans=0.125 2024-08-19 15:27:02,880 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8350, loss[loss=0.1142, beats_loss=0.009517, ecapa_loss=0.0001966, whisper_loss=0.1027, over 14100.00 frames. ], tot_loss[loss=0.1015, beats_loss=0.0105, ecapa_loss=0.0001401, whisper_loss=0.0896, over 3859723.80 frames. ], batch size: 60, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:27:06,999 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4529390.0, ans=0.0 2024-08-19 15:27:24,869 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=4529490.0, ans=0.0 2024-08-19 15:27:30,986 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4529590.0, ans=0.125 2024-08-19 15:27:35,778 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=4529590.0, ans=0.1 2024-08-19 15:27:39,103 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=4529690.0, ans=0.015 2024-08-19 15:27:42,724 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4529690.0, ans=0.125 2024-08-19 15:27:47,354 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=4529690.0, ans=0.125 2024-08-19 15:27:48,751 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-08-19 15:27:49,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-08-19 15:27:50,465 INFO [train_multi_KD3.py:844] (3/4) A total of 66 cuts. 20 from LS+wenet, 18 from Vox, 28 fro AS 2024-08-19 15:27:59,112 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=4529790.0, ans=0.125 2024-08-19 15:28:00,136 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 27 from LS+wenet, 18 from Vox, 29 fro AS 2024-08-19 15:28:01,406 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 21 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 15:28:02,403 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8400, loss[loss=0.09207, beats_loss=0.01194, ecapa_loss=0.0001073, whisper_loss=0.07906, over 21352.00 frames. ], tot_loss[loss=0.102, beats_loss=0.01041, ecapa_loss=0.0001414, whisper_loss=0.09016, over 3858911.64 frames. ], batch size: 83, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:28:06,319 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4529890.0, ans=0.0 2024-08-19 15:28:11,155 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=4529890.0, ans=10.0 2024-08-19 15:28:13,652 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4529990.0, ans=0.04949747468305833 2024-08-19 15:28:22,227 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=4529990.0, ans=0.2 2024-08-19 15:28:42,406 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=4530190.0, ans=0.0 2024-08-19 15:28:54,962 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.797e+01 2.308e+01 2.547e+01 2.800e+01 8.519e+01, threshold=5.094e+01, percent-clipped=2.0 2024-08-19 15:28:57,469 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 20 from LS+wenet, 26 from Vox, 35 fro AS 2024-08-19 15:28:59,753 INFO [train_multi_KD3.py:844] (3/4) A total of 89 cuts. 25 from LS+wenet, 24 from Vox, 40 fro AS 2024-08-19 15:29:01,979 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8450, loss[loss=0.105, beats_loss=0.008314, ecapa_loss=0.0001524, whisper_loss=0.09512, over 22824.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01037, ecapa_loss=0.0001414, whisper_loss=0.09045, over 3851594.02 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:29:02,371 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4530390.0, ans=0.0 2024-08-19 15:29:04,736 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=4530390.0, ans=0.125 2024-08-19 15:29:04,795 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=4530390.0, ans=0.0 2024-08-19 15:29:18,772 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4530490.0, ans=0.0 2024-08-19 15:29:23,742 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=4530490.0, ans=0.125 2024-08-19 15:29:28,240 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4530590.0, ans=0.2 2024-08-19 15:29:31,793 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 31 from LS+wenet, 18 from Vox, 32 fro AS 2024-08-19 15:29:39,121 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=4530690.0, ans=0.0 2024-08-19 15:29:50,166 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.62 vs. limit=15.0 2024-08-19 15:29:58,105 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.93 vs. limit=15.0 2024-08-19 15:30:00,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4530890.0, ans=0.125 2024-08-19 15:30:00,916 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8500, loss[loss=0.1113, beats_loss=0.01059, ecapa_loss=0.0001332, whisper_loss=0.09942, over 22943.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01032, ecapa_loss=0.0001421, whisper_loss=0.09093, over 3891288.67 frames. ], batch size: 88, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:30:01,493 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.85 vs. limit=15.0 2024-08-19 15:30:04,671 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=4530890.0, ans=0.125 2024-08-19 15:30:19,069 INFO [train_multi_KD3.py:844] (3/4) A total of 55 cuts. 19 from LS+wenet, 18 from Vox, 18 fro AS 2024-08-19 15:30:20,862 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.93 vs. limit=15.0 2024-08-19 15:30:30,030 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4531090.0, ans=0.0 2024-08-19 15:30:35,189 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=4531090.0, ans=10.0 2024-08-19 15:30:35,858 INFO [train_multi_KD3.py:844] (3/4) A total of 95 cuts. 28 from LS+wenet, 20 from Vox, 47 fro AS 2024-08-19 15:30:36,176 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4531190.0, ans=0.1 2024-08-19 15:30:46,457 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4531190.0, ans=0.0 2024-08-19 15:30:49,533 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 29 from LS+wenet, 11 from Vox, 21 fro AS 2024-08-19 15:30:53,225 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.824e+01 2.328e+01 2.576e+01 3.014e+01 4.814e+01, threshold=5.151e+01, percent-clipped=0.0 2024-08-19 15:31:00,396 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8550, loss[loss=0.1193, beats_loss=0.01039, ecapa_loss=0.0001657, whisper_loss=0.1073, over 21273.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.0103, ecapa_loss=0.000143, whisper_loss=0.09075, over 3871033.89 frames. ], batch size: 87, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:31:00,540 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 25 from LS+wenet, 15 from Vox, 41 fro AS 2024-08-19 15:31:01,702 INFO [train_multi_KD3.py:844] (3/4) A total of 82 cuts. 20 from LS+wenet, 15 from Vox, 47 fro AS 2024-08-19 15:31:09,432 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4531390.0, ans=0.125 2024-08-19 15:31:12,736 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 15 from LS+wenet, 11 from Vox, 41 fro AS 2024-08-19 15:31:24,944 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=4531590.0, ans=0.125 2024-08-19 15:31:39,430 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=4531690.0, ans=0.0 2024-08-19 15:31:45,507 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=4531690.0, ans=0.0 2024-08-19 15:31:53,781 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4531790.0, ans=0.04949747468305833 2024-08-19 15:32:00,548 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8600, loss[loss=0.1025, beats_loss=0.01095, ecapa_loss=0.0001663, whisper_loss=0.0899, over 20789.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01034, ecapa_loss=0.0001421, whisper_loss=0.0908, over 3874841.51 frames. ], batch size: 82, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:32:04,588 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4531890.0, ans=0.035 2024-08-19 15:32:13,340 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.33 vs. limit=22.5 2024-08-19 15:32:19,270 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2024-08-19 15:32:29,596 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:32:29,620 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=4532090.0, ans=0.2 2024-08-19 15:32:41,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4532190.0, ans=0.0 2024-08-19 15:32:41,945 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.63 vs. limit=10.0 2024-08-19 15:32:49,658 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=4532290.0, ans=0.125 2024-08-19 15:32:51,887 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 24 from LS+wenet, 24 from Vox, 42 fro AS 2024-08-19 15:32:52,879 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.862e+01 2.343e+01 2.555e+01 2.873e+01 3.984e+01, threshold=5.110e+01, percent-clipped=0.0 2024-08-19 15:32:57,876 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 24 from LS+wenet, 15 from Vox, 42 fro AS 2024-08-19 15:33:00,236 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8650, loss[loss=0.1027, beats_loss=0.009727, ecapa_loss=0.0001858, whisper_loss=0.09115, over 15345.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01042, ecapa_loss=0.0001417, whisper_loss=0.09042, over 3891410.94 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:33:11,151 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4532490.0, ans=0.125 2024-08-19 15:33:13,470 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=4532490.0, ans=0.125 2024-08-19 15:33:17,045 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=4532490.0, ans=0.125 2024-08-19 15:33:18,158 INFO [train_multi_KD3.py:844] (3/4) A total of 91 cuts. 31 from LS+wenet, 24 from Vox, 36 fro AS 2024-08-19 15:33:20,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4532490.0, ans=0.05 2024-08-19 15:33:25,341 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 18 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 15:33:27,584 INFO [train_multi_KD3.py:844] (3/4) A total of 78 cuts. 22 from LS+wenet, 24 from Vox, 32 fro AS 2024-08-19 15:33:28,774 WARNING [optim.py:496] (3/4) Scaling gradients by 0.054981451481580734, model_norm_threshold=51.102230072021484 2024-08-19 15:33:28,932 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.0.weight with proportion 0.12, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=1.056e+05, grad_sumsq=1.009e+07, orig_rms_sq=1.047e-02 2024-08-19 15:33:31,667 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=4532590.0, ans=0.0 2024-08-19 15:33:36,209 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 25 from LS+wenet, 27 from Vox, 32 fro AS 2024-08-19 15:33:53,283 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4532790.0, ans=0.2 2024-08-19 15:33:59,661 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=4532890.0, ans=0.125 2024-08-19 15:33:59,929 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-08-19 15:34:00,472 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8700, loss[loss=0.1097, beats_loss=0.007579, ecapa_loss=0.0001569, whisper_loss=0.1006, over 20920.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01034, ecapa_loss=0.0001419, whisper_loss=0.09056, over 3880992.81 frames. ], batch size: 81, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:34:00,608 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 17 from LS+wenet, 18 from Vox, 26 fro AS 2024-08-19 15:34:13,910 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 19 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:34:22,021 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.19 vs. limit=10.0 2024-08-19 15:34:23,903 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4533090.0, ans=0.125 2024-08-19 15:34:33,209 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 26 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 15:34:33,423 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=4533090.0, ans=0.035 2024-08-19 15:34:33,855 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.86 vs. limit=22.5 2024-08-19 15:34:40,933 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=4533190.0, ans=0.125 2024-08-19 15:34:53,681 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.820e+01 2.310e+01 2.553e+01 2.787e+01 9.294e+02, threshold=5.105e+01, percent-clipped=1.0 2024-08-19 15:34:58,377 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 21 from LS+wenet, 15 from Vox, 27 fro AS 2024-08-19 15:34:58,828 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=4533290.0, ans=0.2 2024-08-19 15:35:00,800 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8750, loss[loss=0.1161, beats_loss=0.009719, ecapa_loss=0.0001077, whisper_loss=0.1053, over 19195.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01033, ecapa_loss=0.0001416, whisper_loss=0.0907, over 3873255.58 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:35:06,016 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4533390.0, ans=0.0 2024-08-19 15:35:08,279 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 17 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 15:35:16,593 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=4533490.0, ans=0.04949747468305833 2024-08-19 15:35:17,493 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 26 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-19 15:35:21,382 INFO [train_multi_KD3.py:844] (3/4) A total of 60 cuts. 20 from LS+wenet, 17 from Vox, 23 fro AS 2024-08-19 15:35:29,631 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 22 from LS+wenet, 27 from Vox, 30 fro AS 2024-08-19 15:35:34,464 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 27 from Vox, 35 fro AS 2024-08-19 15:35:45,560 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 28 from LS+wenet, 27 from Vox, 38 fro AS 2024-08-19 15:35:46,153 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-08-19 15:35:58,130 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2024-08-19 15:36:00,676 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8800, loss[loss=0.103, beats_loss=0.008437, ecapa_loss=0.000144, whisper_loss=0.09314, over 13989.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.0104, ecapa_loss=0.0001404, whisper_loss=0.09086, over 3898130.57 frames. ], batch size: 54, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:36:10,608 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4533890.0, ans=0.1 2024-08-19 15:36:11,871 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=4533990.0, ans=0.04949747468305833 2024-08-19 15:36:22,612 INFO [train_multi_KD3.py:844] (3/4) A total of 93 cuts. 24 from LS+wenet, 28 from Vox, 41 fro AS 2024-08-19 15:36:27,260 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 25 from LS+wenet, 17 from Vox, 26 fro AS 2024-08-19 15:36:34,702 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=4534090.0, ans=0.0 2024-08-19 15:36:40,569 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 14 from LS+wenet, 19 from Vox, 25 fro AS 2024-08-19 15:36:42,904 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 23 from LS+wenet, 20 from Vox, 31 fro AS 2024-08-19 15:36:43,435 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4534190.0, ans=0.125 2024-08-19 15:36:50,228 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4534290.0, ans=0.0 2024-08-19 15:36:51,316 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4534290.0, ans=0.125 2024-08-19 15:36:53,351 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.915e+01 2.339e+01 2.639e+01 2.921e+01 5.304e+01, threshold=5.278e+01, percent-clipped=1.0 2024-08-19 15:37:00,584 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8850, loss[loss=0.1056, beats_loss=0.01067, ecapa_loss=0.0001505, whisper_loss=0.09342, over 22194.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.01049, ecapa_loss=0.0001408, whisper_loss=0.08992, over 3881214.69 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:37:04,253 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 26 from LS+wenet, 21 from Vox, 43 fro AS 2024-08-19 15:37:22,315 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 28 from LS+wenet, 23 from Vox, 30 fro AS 2024-08-19 15:37:37,894 INFO [train_multi_KD3.py:844] (3/4) A total of 97 cuts. 28 from LS+wenet, 27 from Vox, 42 fro AS 2024-08-19 15:37:38,136 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4534690.0, ans=0.125 2024-08-19 15:37:47,766 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-08-19 15:37:56,009 INFO [train_multi_KD3.py:844] (3/4) A total of 61 cuts. 21 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:38:00,678 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8900, loss[loss=0.1146, beats_loss=0.009751, ecapa_loss=0.0001387, whisper_loss=0.1034, over 22776.00 frames. ], tot_loss[loss=0.1022, beats_loss=0.01049, ecapa_loss=0.0001398, whisper_loss=0.09029, over 3873842.67 frames. ], batch size: 89, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:38:01,092 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4534890.0, ans=0.1 2024-08-19 15:38:10,098 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-08-19 15:38:31,522 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 21 from LS+wenet, 16 from Vox, 27 fro AS 2024-08-19 15:38:35,737 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4535090.0, ans=0.125 2024-08-19 15:38:54,824 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.921e+01 2.335e+01 2.657e+01 2.947e+01 4.207e+01, threshold=5.314e+01, percent-clipped=0.0 2024-08-19 15:39:01,355 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=4535390.0, ans=0.125 2024-08-19 15:39:02,158 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 8950, loss[loss=0.09644, beats_loss=0.009426, ecapa_loss=0.000164, whisper_loss=0.08537, over 16325.00 frames. ], tot_loss[loss=0.1025, beats_loss=0.01047, ecapa_loss=0.0001405, whisper_loss=0.09058, over 3895510.02 frames. ], batch size: 66, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:39:19,669 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=4535490.0, ans=0.125 2024-08-19 15:39:27,635 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 40 from LS+wenet, 20 from Vox, 30 fro AS 2024-08-19 15:39:30,334 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4535590.0, ans=0.125 2024-08-19 15:39:31,414 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4535590.0, ans=0.0 2024-08-19 15:39:32,482 INFO [train_multi_KD3.py:844] (3/4) A total of 81 cuts. 23 from LS+wenet, 26 from Vox, 32 fro AS 2024-08-19 15:39:34,897 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 14 from LS+wenet, 15 from Vox, 28 fro AS 2024-08-19 15:39:45,644 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=4535690.0, ans=0.0 2024-08-19 15:39:50,712 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4535790.0, ans=0.125 2024-08-19 15:39:58,962 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4535790.0, ans=0.125 2024-08-19 15:40:01,356 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=4535890.0, ans=0.125 2024-08-19 15:40:02,145 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9000, loss[loss=0.1142, beats_loss=0.008683, ecapa_loss=0.0001615, whisper_loss=0.1039, over 17752.00 frames. ], tot_loss[loss=0.1035, beats_loss=0.01032, ecapa_loss=0.0001419, whisper_loss=0.09173, over 3882804.25 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:40:02,146 INFO [train_multi_KD3.py:1139] (3/4) Computing validation loss 2024-08-19 15:40:14,403 INFO [zipformer.py:1877] (3/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.9427, 2.2450, 2.7585, 2.3361, 2.9662, 2.7819, 2.7812, 2.3855], device='cuda:3') 2024-08-19 15:40:30,374 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on ASR_libri: loss=0.2531, beats_loss=0, ecapa_loss=0.0005084, whisper_loss=0.248, over 922467.00 frames. 2024-08-19 15:40:43,457 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on SV_voxceleb1: loss=0.004046, beats_loss=0, ecapa_loss=0.0004046, whisper_loss=0, over 939242.00 frames. 2024-08-19 15:42:05,913 INFO [train_multi_KD3.py:1149] (3/4) Epoch 31, validation on AT_audioset: loss=0.02311, beats_loss=0.02311, ecapa_loss=0, whisper_loss=0, over 3737520.00 frames. 2024-08-19 15:42:05,916 INFO [train_multi_KD3.py:1155] (3/4) Maximum memory allocated so far is 32111MB 2024-08-19 15:42:09,709 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4535890.0, ans=0.125 2024-08-19 15:42:14,350 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 15:42:19,732 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=4535990.0, ans=0.0 2024-08-19 15:42:23,986 WARNING [optim.py:496] (3/4) Scaling gradients by 0.08310459554195404, model_norm_threshold=53.13531494140625 2024-08-19 15:42:24,144 INFO [optim.py:564] (3/4) Parameter dominating tot_sumsq module.encoder_embed.conv.4.weight with proportion 0.13, where dominant_sumsq=(grad_sumsq*orig_rms_sq)=5.161e+04, grad_sumsq=1.575e+04, orig_rms_sq=3.277e+00 2024-08-19 15:42:34,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4536090.0, ans=0.125 2024-08-19 15:42:41,655 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4536190.0, ans=0.1 2024-08-19 15:42:58,834 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.804e+01 2.296e+01 2.631e+01 2.912e+01 6.394e+02, threshold=5.262e+01, percent-clipped=1.0 2024-08-19 15:43:05,973 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9050, loss[loss=0.09148, beats_loss=0.009753, ecapa_loss=0.0001386, whisper_loss=0.08034, over 15900.00 frames. ], tot_loss[loss=0.1034, beats_loss=0.01032, ecapa_loss=0.0001416, whisper_loss=0.0917, over 3890316.27 frames. ], batch size: 61, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:43:15,701 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 33 from LS+wenet, 22 from Vox, 24 fro AS 2024-08-19 15:43:34,260 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.79 vs. limit=22.5 2024-08-19 15:43:40,788 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=4536690.0, ans=0.125 2024-08-19 15:43:42,116 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=4536690.0, ans=0.2 2024-08-19 15:43:42,955 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 27 from LS+wenet, 23 from Vox, 38 fro AS 2024-08-19 15:44:05,266 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9100, loss[loss=0.09098, beats_loss=0.009877, ecapa_loss=0.0001946, whisper_loss=0.07916, over 12688.00 frames. ], tot_loss[loss=0.1028, beats_loss=0.01038, ecapa_loss=0.0001423, whisper_loss=0.09099, over 3871736.28 frames. ], batch size: 53, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:44:29,807 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=4537090.0, ans=0.0 2024-08-19 15:44:43,372 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 33 from LS+wenet, 21 from Vox, 34 fro AS 2024-08-19 15:44:52,707 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4537190.0, ans=0.125 2024-08-19 15:44:55,747 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=4537290.0, ans=0.0 2024-08-19 15:44:57,531 INFO [train_multi_KD3.py:844] (3/4) A total of 94 cuts. 31 from LS+wenet, 21 from Vox, 42 fro AS 2024-08-19 15:44:59,433 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=4537290.0, ans=0.2 2024-08-19 15:45:03,564 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.907e+01 2.267e+01 2.524e+01 2.813e+01 7.972e+01, threshold=5.047e+01, percent-clipped=1.0 2024-08-19 15:45:07,845 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-08-19 15:45:11,595 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=4537390.0, ans=0.125 2024-08-19 15:45:12,320 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9150, loss[loss=0.1235, beats_loss=0.01112, ecapa_loss=0.0001277, whisper_loss=0.1111, over 23460.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.000141, whisper_loss=0.0905, over 3896685.80 frames. ], batch size: 91, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:45:13,296 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.89 vs. limit=10.0 2024-08-19 15:45:27,889 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 23 from LS+wenet, 20 from Vox, 49 fro AS 2024-08-19 15:45:31,980 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=4537490.0, ans=0.125 2024-08-19 15:45:33,528 INFO [train_multi_KD3.py:844] (3/4) A total of 69 cuts. 29 from LS+wenet, 13 from Vox, 27 fro AS 2024-08-19 15:45:36,222 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 30 from LS+wenet, 19 from Vox, 41 fro AS 2024-08-19 15:46:00,219 INFO [train_multi_KD3.py:844] (3/4) A total of 86 cuts. 24 from LS+wenet, 23 from Vox, 39 fro AS 2024-08-19 15:46:01,465 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 24 from LS+wenet, 17 from Vox, 34 fro AS 2024-08-19 15:46:03,179 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.53 vs. limit=22.5 2024-08-19 15:46:05,680 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=4537790.0, ans=0.125 2024-08-19 15:46:07,915 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=22.5 2024-08-19 15:46:09,280 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4537790.0, ans=0.1 2024-08-19 15:46:15,705 INFO [train_multi_KD3.py:844] (3/4) A total of 84 cuts. 28 from LS+wenet, 27 from Vox, 29 fro AS 2024-08-19 15:46:19,733 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9200, loss[loss=0.1139, beats_loss=0.008796, ecapa_loss=0.0001494, whisper_loss=0.1036, over 19262.00 frames. ], tot_loss[loss=0.1024, beats_loss=0.01047, ecapa_loss=0.0001412, whisper_loss=0.09049, over 3923798.50 frames. ], batch size: 74, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:46:21,289 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4537890.0, ans=0.125 2024-08-19 15:46:22,756 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=4537890.0, ans=0.0 2024-08-19 15:46:26,674 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4537890.0, ans=0.125 2024-08-19 15:46:34,339 INFO [train_multi_KD3.py:844] (3/4) A total of 72 cuts. 17 from LS+wenet, 27 from Vox, 28 fro AS 2024-08-19 15:46:51,368 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 33 from LS+wenet, 22 from Vox, 35 fro AS 2024-08-19 15:46:52,813 INFO [train_multi_KD3.py:844] (3/4) A total of 74 cuts. 19 from LS+wenet, 25 from Vox, 30 fro AS 2024-08-19 15:47:04,128 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-08-19 15:47:07,599 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4538190.0, ans=0.1 2024-08-19 15:47:08,891 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=4538190.0, ans=0.07 2024-08-19 15:47:16,757 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.886e+01 2.333e+01 2.570e+01 2.855e+01 5.209e+01, threshold=5.141e+01, percent-clipped=1.0 2024-08-19 15:47:24,291 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9250, loss[loss=0.1095, beats_loss=0.009928, ecapa_loss=0.0001344, whisper_loss=0.0982, over 17712.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01055, ecapa_loss=0.0001403, whisper_loss=0.08963, over 3920486.97 frames. ], batch size: 71, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:47:29,571 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=4538390.0, ans=0.0 2024-08-19 15:47:38,159 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.60 vs. limit=22.5 2024-08-19 15:47:53,129 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 24 from LS+wenet, 22 from Vox, 42 fro AS 2024-08-19 15:47:57,025 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=4538590.0, ans=0.0 2024-08-19 15:48:06,165 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4538690.0, ans=0.125 2024-08-19 15:48:14,904 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=4538690.0, ans=0.125 2024-08-19 15:48:16,458 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-08-19 15:48:18,698 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4538790.0, ans=0.125 2024-08-19 15:48:22,062 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 32 from LS+wenet, 23 from Vox, 35 fro AS 2024-08-19 15:48:29,377 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9300, loss[loss=0.09349, beats_loss=0.01002, ecapa_loss=0.0001421, whisper_loss=0.08204, over 15228.00 frames. ], tot_loss[loss=0.1027, beats_loss=0.01044, ecapa_loss=0.000141, whisper_loss=0.09081, over 3946874.09 frames. ], batch size: 62, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:48:31,159 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4538890.0, ans=0.125 2024-08-19 15:48:40,582 INFO [train_multi_KD3.py:844] (3/4) A total of 80 cuts. 27 from LS+wenet, 22 from Vox, 31 fro AS 2024-08-19 15:48:42,056 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=4538990.0, ans=0.07 2024-08-19 15:48:42,080 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=4538990.0, ans=0.0 2024-08-19 15:48:42,916 INFO [train_multi_KD3.py:844] (3/4) A total of 79 cuts. 23 from LS+wenet, 25 from Vox, 31 fro AS 2024-08-19 15:48:45,891 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.15 vs. limit=10.0 2024-08-19 15:49:02,301 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 26 from LS+wenet, 24 from Vox, 38 fro AS 2024-08-19 15:49:10,657 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.14 vs. limit=22.5 2024-08-19 15:49:16,224 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4539190.0, ans=0.95 2024-08-19 15:49:25,214 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.995e+01 2.419e+01 2.678e+01 2.934e+01 3.690e+01, threshold=5.356e+01, percent-clipped=0.0 2024-08-19 15:49:25,583 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=4539290.0, ans=0.125 2024-08-19 15:49:33,378 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9350, loss[loss=0.1172, beats_loss=0.009477, ecapa_loss=0.0001741, whisper_loss=0.1059, over 21716.00 frames. ], tot_loss[loss=0.1029, beats_loss=0.01042, ecapa_loss=0.0001409, whisper_loss=0.09103, over 3926570.18 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:49:45,806 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=4539490.0, ans=0.0 2024-08-19 15:49:48,335 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4539490.0, ans=0.1 2024-08-19 15:50:10,420 INFO [train_multi_KD3.py:844] (3/4) A total of 57 cuts. 17 from LS+wenet, 15 from Vox, 25 fro AS 2024-08-19 15:50:20,506 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=4539690.0, ans=0.0 2024-08-19 15:50:35,127 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9400, loss[loss=0.1017, beats_loss=0.008169, ecapa_loss=0.0001383, whisper_loss=0.09216, over 17420.00 frames. ], tot_loss[loss=0.1026, beats_loss=0.01041, ecapa_loss=0.0001413, whisper_loss=0.09077, over 3948337.65 frames. ], batch size: 65, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:50:39,186 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=4539890.0, ans=0.2 2024-08-19 15:50:53,364 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 21 from LS+wenet, 18 from Vox, 36 fro AS 2024-08-19 15:50:54,652 INFO [train_multi_KD3.py:844] (3/4) A total of 88 cuts. 22 from LS+wenet, 21 from Vox, 45 fro AS 2024-08-19 15:50:57,499 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.36 vs. limit=15.0 2024-08-19 15:51:15,359 INFO [train_multi_KD3.py:844] (3/4) A total of 67 cuts. 18 from LS+wenet, 22 from Vox, 27 fro AS 2024-08-19 15:51:18,412 INFO [train_multi_KD3.py:844] (3/4) A total of 77 cuts. 22 from LS+wenet, 20 from Vox, 35 fro AS 2024-08-19 15:51:25,668 INFO [train_multi_KD3.py:844] (3/4) A total of 92 cuts. 29 from LS+wenet, 27 from Vox, 36 fro AS 2024-08-19 15:51:26,206 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=4540190.0, ans=0.0 2024-08-19 15:51:39,321 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.736e+01 2.311e+01 2.569e+01 2.722e+01 4.090e+01, threshold=5.139e+01, percent-clipped=0.0 2024-08-19 15:51:50,650 INFO [train_multi_KD3.py:844] (3/4) A total of 85 cuts. 32 from LS+wenet, 17 from Vox, 36 fro AS 2024-08-19 15:51:51,457 INFO [scaling.py:1120] (3/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.640e-02 2024-08-19 15:51:52,099 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9450, loss[loss=0.118, beats_loss=0.009843, ecapa_loss=0.0001314, whisper_loss=0.1069, over 21965.00 frames. ], tot_loss[loss=0.1018, beats_loss=0.0105, ecapa_loss=0.0001413, whisper_loss=0.08991, over 3925012.98 frames. ], batch size: 85, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:51:55,419 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=4540390.0, ans=0.0 2024-08-19 15:51:56,785 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4540390.0, ans=0.125 2024-08-19 15:52:26,147 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4540590.0, ans=0.1 2024-08-19 15:52:27,915 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=4540590.0, ans=10.0 2024-08-19 15:52:42,649 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=12.0 2024-08-19 15:53:03,235 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=4540790.0, ans=0.05 2024-08-19 15:53:13,520 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9500, loss[loss=0.08114, beats_loss=0.01311, ecapa_loss=0.0001311, whisper_loss=0.06672, over 19994.00 frames. ], tot_loss[loss=0.1023, beats_loss=0.01051, ecapa_loss=0.0001408, whisper_loss=0.0904, over 3944869.51 frames. ], batch size: 84, lr: 1.97e-03, grad_scale: 5.764607523034235e+17 2024-08-19 15:53:16,729 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=4540890.0, ans=0.0 2024-08-19 15:53:28,144 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4540990.0, ans=0.0 2024-08-19 15:53:52,496 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-08-19 15:54:05,444 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=4541190.0, ans=0.95 2024-08-19 15:54:30,529 INFO [train_multi_KD3.py:844] (3/4) A total of 58 cuts. 21 from LS+wenet, 18 from Vox, 19 fro AS 2024-08-19 15:54:37,967 INFO [optim.py:476] (3/4) Clipping_scale=2.0, grad-norm quartiles 1.798e+01 2.373e+01 2.637e+01 2.974e+01 3.781e+01, threshold=5.275e+01, percent-clipped=0.0 2024-08-19 15:54:51,065 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9550, loss[loss=0.1126, beats_loss=0.00771, ecapa_loss=0.0001661, whisper_loss=0.1033, over 17813.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01044, ecapa_loss=0.000142, whisper_loss=0.08971, over 3923084.97 frames. ], batch size: 70, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:55:13,152 INFO [train_multi_KD3.py:844] (3/4) A total of 64 cuts. 17 from LS+wenet, 23 from Vox, 24 fro AS 2024-08-19 15:55:14,675 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 28 from LS+wenet, 21 from Vox, 41 fro AS 2024-08-19 15:55:23,800 INFO [train_multi_KD3.py:844] (3/4) A total of 75 cuts. 22 from LS+wenet, 26 from Vox, 27 fro AS 2024-08-19 15:55:29,185 INFO [train_multi_KD3.py:844] (3/4) A total of 83 cuts. 23 from LS+wenet, 25 from Vox, 35 fro AS 2024-08-19 15:55:35,244 INFO [train_multi_KD3.py:844] (3/4) A total of 62 cuts. 13 from LS+wenet, 15 from Vox, 34 fro AS 2024-08-19 15:55:52,091 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=15.0 2024-08-19 15:56:12,930 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4541690.0, ans=0.1 2024-08-19 15:56:16,440 INFO [train_multi_KD3.py:844] (3/4) A total of 59 cuts. 26 from LS+wenet, 12 from Vox, 21 fro AS 2024-08-19 15:56:36,996 INFO [train_multi_KD3.py:844] (3/4) A total of 90 cuts. 27 from LS+wenet, 16 from Vox, 47 fro AS 2024-08-19 15:56:38,092 INFO [train_multi_KD3.py:1116] (3/4) Epoch 31, batch 9600, loss[loss=0.1015, beats_loss=0.01346, ecapa_loss=0.0001094, whisper_loss=0.08697, over 23192.00 frames. ], tot_loss[loss=0.1016, beats_loss=0.01041, ecapa_loss=0.0001412, whisper_loss=0.08981, over 3880782.89 frames. ], batch size: 90, lr: 1.97e-03, grad_scale: 2.8823037615171174e+17 2024-08-19 15:56:55,820 INFO [train_multi_KD3.py:844] (3/4) A total of 68 cuts. 17 from LS+wenet, 22 from Vox, 29 fro AS 2024-08-19 15:57:24,859 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4542090.0, ans=0.0 2024-08-19 15:57:26,964 INFO [scaling.py:1024] (3/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.91 vs. limit=15.0 2024-08-19 15:57:45,473 INFO [scaling.py:214] (3/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=4542190.0, ans=0.0 2024-08-19 15:58:00,911 INFO [train_multi_KD3.py:844] (3/4) A total of 63 cuts. 24 from LS+wenet, 15 from Vox, 24 fro AS